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CYTOCHROME P450 MONOOXYGENASE AND 
5 NADPH CYTOCHROME P450 OXIDO REDUCTASE GENES AND 

PROTEINS RELATED TO THE OMEGA HYDROXYLASE COMPLEX OF 
CANDIDA TROPICALTS AND METHODS RELATING THERETO 

CROSS REFERENCE TO RELATED APPLICATIONS 
10 This application claims priority to U.S. Provisional Application Serial No. 

60/103,099 filed October 5 t 1998, and U.S. Provisional Application Serial No. 60/083,798 filed 
May 1, 1998. 

BACKGROUND 

IS 1. peld of the invention 

The present invention relates to novel genes which encode enzymes of the a>- 
hydroxylase complex in yeast Candida tropicalis strains. In particular, the invention relates to 
novel genes encoding the cytochrome P450 and NADPH reductase enzymes of the o- 
hydroxylase complex in yeast Candida tropicalis, and to a method of quantitating the expression 

20 of genes. 

2. Description of the Related Art 

Aliphatic dioic acids are versatile chemical intermediates useful as raw 
materials for the preparation of perfumes, polymers, adhesives and macrolid antibiotics. While 

25 several chemical routes to the synthesis of long-chain alpha, co-dicarboxylic acids are available, 
the synthesis is not easy and most methods result in mixtures containing shorter chain lengths. 
As a result, extensive purification steps are necessary. While it is known that long-chain dioic 
acids can also be produced by microbial transfonnation of alkanes, fatty acids or esters thereof, 
chemical synthesis has remained the most commercially viable route, due to limitations with the 

30 current biological approaches. 

Several strains of yeast are known to excrete alpha, tricarboxylic acids as a 
byproduct when cultured on alkanes or fatty acids as the carbon source. In particular, yeast 
belonging to the Genus Candida, such as C albicans, C. cloacae, C guillermondil, C. 
intermedia, C. lipolytica, C maltosa, C parapsilosis and C. zeylenoides are known to produce 
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such dicarboxylic acids (Agr. Biol Ckem. 35: 2033-2042 (1971)). Also, various strains of C. 
(ropicalis are known to produce dicarboxylic acids ranging in chain lengths from C n through C, g 
(Okino et al M BM Lawrence, BD Mookherjee and BJ Willis (eds), in Flavors and Fragrances; A 
World Perspective. Proceedings of the 10 th International Conference of Essential Oils, Flavors 
5 and Fragrances, Elsevier Science Publishers BV Amsterdam (1988)), and are the basis of several 
patents as reviewed by BQhler and Schindler, in Aliphatic Hydrocarbons in Biotechnology, H. J. 
Rehm and G. Reed (eds), Vol 169, Verlag Chemie, Weinheim (1984). 

Studies of the biochemical processes by which yeasts metabolize alkanes and fatty 
acids have revealed three types of oxidation reactions: a-oxidation of alkanes to alcohols, u- 
10 oxidation of fatty acids to alpha, co-dicarboxylic acids and the degradative /^oxidation of fatty 
acids to C0 2 and water. The first two types of oxidations are catalyzed by microsomal enzymes 
while the last type takes place in the peroxisomes. In C. tropicalis, the first step in the co- 
oxidation pathway is catalyzed by a membrane-bound enzyme complex (<o-hydroxylase 

■'r 

complex) including a cytochrome P450 monooxygenase and a NADPH cytochrome reductase. 

15 This hydroxylase complex is responsible for the primary oxidation of the terminal methyl group 
in alkanes and fatty acids (Gilewicz et aL, Can. J. Microbiol 25:201 (1979)). The genes which 
encode the cytochrome P450 and NADPH reductase components of the complex have previously 
been identified as P450ALK and P450RED respectively, and have also been cloned and 
sequenced (Sanglard et aL, Gene 76:121-136 (1989)). P450ALK has also been designated 

20 P450ALK1 . More recently, ALK genes have been designated by the symbol CYP and RED 
genes have been designated by the symbol CPR. See, e.g., Nelson, Pharmacogenetics 6(1): 1-42 
(1996), which is incorporated herein by reference. See also Ohkuma et aL, DNA and Cell 
Biology 14:163-173 (1995), Seghezzi et aL, DNA and Cell Biology , 11:767-780 (1992) and 
Kargel et aL, Yeast 12:333-348 (1996), each incorporated herein by reference. For example, 

25 P450ALK is also designated CYP52 according to the nomenclature of Nelson, supra. Fatty acids 
are ultimately formed from alkanes after two additional oxidation steps, catalyzed by alcohol 
oxidase (Kemp et & f Appl Microbiol, and BiotechnoL 28: 370-374 (1988)) and aldehyde 
dehydrogenase. The fatty acids can be further oxidized through the same or similar pathway to 
the corresponding dicarboxylic acid The o-oxidation of fatty acids proceeds via the (*>-hydroxy 

30 fatty acid and its aldehyde derivative, to the corresponding dicarboxylic acid without the 
requirement for CoA activation. However, both fatty acids and dicarboxylic acids can be 
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degraded, after activation to the corresponding acyl-CoA ester through the P-oxidation pathway 
in the peroxisomes, leading to chain shortening. In mammalian systems, both fatty acid and 
dicarboxylic acid products of o-oxidation are activated to their CoA-esters at equal rates and are 
substrates for both mitochondrial and peroxisomal P-oxidation (J. Biochem., 102:225-234 
5 (1987)). In yeast, p-oxidation takes place solely in the peroxisomes (Agr.Biol.Chem. 49:1821- 
1828 (1985)). 

The production of dicarboxylic acids by fermentation of unsaturated C M -C, 6 
monocarboxylic acids using a strain of the species C tropicalis is disclosed in U.S. Patent 
4,474,882. The unsaturated dicarboxylic acids correspond to the starting materials in the number 
10 and position of the double bonds. Similar processes in which other special microorganisms are 
used are described in U.S. Patents 3,975,234 and 4,339,536, in British Patent Specification 
1,405,026 and in German Patent Publications 21 64 626, 28 53 847, 29 37 292, 29 51 177, and 
21 40 133. 

Cytochromes P450 (P450s) are terminal monooxidases of a 
15 multicomponent enzyme system as described above. They comprise a superfamily of proteins 
which exist widely in nature having been isolated from a variety of organisms as described e.g., 
in Nelson, supra. These organisms include various mammals, fish, invertebrates, plants, 
mollusk, crustaceans, lower eukaryotes and bacteria (Nelson, supra). First discovered in rodent 
liver microsomes as a carbon-monoxide binding pigment as described, e.g., in Garfinkel, Arch 
20 Biochem. Biophys. 77:493-509 (1 958), which is incorporated herein by reference, P450s were 
later named based on their 

absorption at 450 nm in a reduced-CO coupled difference spectrum as described, e.g., in Omura 
et al., J. Biol Chem. 239:2370-2378 (1964), which is incorporated herein by reference. 

P450s catalyze the metabolism of a variety of endogenous and exogenous 

25 compounds (Nelson, supra). Endogenous compounds include steroids, prostanoids, eicosanoids, 
fat-soluble vitamins, fatty acids, mammalian alkaloids, leukotrines, biogenic amines and 
phytolexins (Nelson, supra). P450 metabolism involves such reactions as epoxidation, 
hydroxylation, deakylation, N-hydroxylation, sulfoxidation, desulfuration and reductive 
dehalogenation. These reactions generally make the compound more water soluble, which is 

30 conducive for excretion, and more electrophilic. These electrophilic products can have 

detrimental effects if they react with DNA or other cellular constituents. However, they can react 
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through conjugation with low molecular weight hydrophilic substances resulting in 
giucoronidation, sulfation, acetylation, amino acid conjugation or glutathione conjugation 
typically leading to inactivation and elimination as described, e.g., in Klaassen et al., Toxicology^ 
3 rd ed, Macmillan, New York, 1986, incorporated herein by reference. 
5 P450s are heme thiolate proteins consisting of a heme moiety bound to a single 

polypeptide chain of 45,000 to 55,000 Da. The iron of the heme prosthetic group is located at 
the center of a protoporphyrin ring. Four ligands of the heme iron can be attributed to the 
porphyrin ring. The fifth ligand is a thiolate anion from a cysteinyl residue of the polypeptide. 
The sixth ligand is probably a hydroxyl group from an amino acid residue, or a moiety with a 
10 similar field strength such as a water molecule as described, e.g., in Goeptar et al., Critical 
Reviews in Toxicology 25(l):25-65 (1995), incorporated herein by reference. 

Monooxygenation reactions catalyzed by cytochromes P450 in a eukaryotic 
membrane-bound system require the transfer of electrons from NADPH to P450 via NADPH- 
cytochrome P450 reductase (CPR) as described, e.g., in Taniguchi et al., Arch Biochem. 
Biophys. 232:585 (1984), incorporated herein by reference. CPR is a flavoprotein of 
approximately 78,000 Da containing 1 mol of flavin adenine dinucleotide (FAD) and 1 mol of 
flavin mononucleotide (FMN) per mole of enzyme as described, e.g., in Potter et al., J. Biol 
Chem. 258:6906 (1983), incorporated herein by reference. The FAD moiety of CPR is the site of 
electron entry into the enzyme, whereas FMN is the electron-donating site to P450 as described, 
e.g., in Vermilion et al., J. Biol Chem. 253:8812 (1978), incorporated herein by reference. THe 
overall reaction is as follows: 



15 



20 



H + + RH + NADPH + 0 2 ROH + NADP + + H 2 0 



25 Binding of a substrate to the catalytic site of P450 apparently results in a 

conformational change initiating electron transfer from CPR to P450. Subsequent to the transfer 
of the first electron, 0 2 binds to the Fe2 + -P450 substrate complex to form Fe 3 * -P450-substrate 
complex. This complex is then reduced by a second electron from CPR, or, in some cases, 
NADH via cytochrome b5 and NADH-cytochrorne b5 reductase as described, e.g., in Guengerich 

30 et al., Arch. Biochem. Biophys. 205:365 (1980), incorporated herein by reference. One atom of 
this reactive oxygen is introduced into the substrate, while the other is reduced to water. The 

•4- 
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oxygenated substrate then dissociates, regenerating the oxidized form of the cytochrome P4S0 as 
described, e.g., in Klassen, Amdur and DouII, Casarett and Doull's Toxicology, Macmillan, New 
York (1986), incorporated herein by reference. 

The P450 reaction cycle can be short-circuited in such a way that 0 2 is reduced to 
5 0 2 ~ and/or H 2 0 2 instead of being utilized for substrate oxygenation. This side reaction is often 
referred to as the "uncoupling" of cytochrome P450 as described, e.g., in Kuthen et al., Eur. J. 
Biochem. 126:583 (1982) and Poulos et al., FASEB J. 6:674 (1992), both of which are 
incorporated herein by reference. The formation of these oxygen radicals may lead to oxidative 
cell damage as described, e.g., in Mukhopadhyay, /. Biol Chem. 269( 1 8) : 1 3 3 90- 1 3 3 97 (1994) 
10 and Ross et al., Biochem. Pharm. 49(7):979-989 (1995), both of which are incorporated herein 
by reference. It has been proposed that cytochrome b5's effect on P450 binding to the CPR 
results in a more stable complex which is less likely to become "uncoupled 11 as described, e.g., in 
Yamazaki et al., Arch Biochem. Biophys. 325(2): 174-1 82 (1996), incorporated herein by 
reference, 

15 P450 families are assigned based upon protein sequence comparisons. 

Notwithstanding a certain amount of heterogeneity, a practical classification of P450s into 
families can be obtained based on deduced amino acid sequence similarity. P450s with amino 
acid sequence similarity of between about 40 - 80% are considered to be in the same family, with 
sequences of about > 55% belonging to the same subfamily. Those with sequence similarity of 

20 about < 40% are generally listed as members of different P450 gene families (Nelson, supra). A 
value of about > 97% is taken to indicate allelic variants of the same gene, unless proven 
otherwise based on catalytic activity, sequence divergence in non-translated regions of the gene 
sequence, or chromosomal mapping. 

The most highly conserved region is the HR2 consensus containing the invariant 

25 cysteine residue near the carboxyl terminus which is required for heme binding as described, e.g., 
in Gotoh et al. J. Biochem. 93:807-817(1983) and Motohashi et ai., 7. Biochem. 101:879-997 
(1987), both of which are incorporated herein by reference. Additional consensus regions, 
including the central region of helix I and the transmembrane region, have also been identified, 
as described, e.g, in Goeptar et al., supra and Kalb et al., PNAS. 85:722 1 -7225 ( 1 988), 

30 incorporated herein by reference, although the HR2 cysteine is the only invariant amino acid 
among P450s. 
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Short chain (*C12) aliphatic dicarboxylic acids (diacids) are important industrial 
intermediates in the manufacture of diesters and polymers, and find application as 
thermoplastics, plasticizing agents, lubricants, hydraulic fluids, agricultural chemicals, 
pharmaceuticals, dyes, surfactants, and adhesives. The high price and limited availability of 
5 short chain diacids are due to constraints imposed by the existing chemical synthesis. 

Long-chain diacids (aliphatic a, co-dicarboxylic acids with carbon numbers of 12 
or greater, hereafter also referred to as diacids) (HOOC-(CH 2 )„-COOH) are a versatile family of 
chemicals with demonstrated and potential utility in a variety of chemical products including 
plastics, adhesives, and fragrances. Unfortunately, the full market potential of diacids has not 

10 been realized because chemical processes produce only a limited range of these materials at a 
relatively high price. In addition, chemical processes for the production of diacids have a 
number of limitations and disadvantages. All the chemical processes are restricted to the 
production of diacids of specific carbon chain lengths. For example, the dodecanedioic acid 
process starts with butadiene. The resulting product diacids are limited to multiples of four* 

15 carbon lengths and, in practice, only dodecanedioic acid is made. The dodecanedioic process is 
based on nonrenewable petrochemical feedstocks. The multireaction conversion process 
produces unwanted byproducts, which result in yield losses, NO x pollution and heavy metal 
wastes. 

Long-chain diacids offer potential advantages over shorter chain diacids, but their 
20 high selling price and limited commercial availability prevent widespread growth in many of 
these applications. Biocatalysis offers an innovative way to overcome these limitations with a 
process that produces a wide range of diacid products from renewable feedstocks. However, 
there is no commercially viable bioprocess to produce long chain diacids from renewable 
resources. 

25 

SUMMARY OF THE INVENTION 

An isolated nucleic acid is provided which encodes a CPRA protein having the 
amino acid sequence set forth in SEQ ID NO: 83, An isolated nucleic acid is also provided 
which includes a coding region defined by nucleotides 1006-3042 as set forth in SEQ ID NO: 81. 
30 An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 83. A vector is provided which includes a nucleotide sequence encoding CPRA protein 

-6- 
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including an amino acid sequence as set forth in SEQ ID NO: 83. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CPRA protein having an amino acid 
sequence as set forth in SEQ ID NO: 83. A method of producing a CPRA protein including an 
amino acid sequence as set forth in SEQ ID NO: 83 is also provided which includes a) 
5 transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 83; and b) culturing the cell under conditions favoring 
the expression of the protein. 

An isolated nucleic acid is provided which encodes a CPRB protein having the 
amino acid sequence set forth in SEQ ID NO: 84. An isolated nucleic acid is provided which 

10 includes a coding region defined by nucleotides 1 033-3069 as set forth in SEQ ID NO: 82. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
84. A vector is provided which includes a nucleotide sequence encoding CPRB protein 
including an amino acid sequence as set forth in SEQ ID NO: 84. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CPRB protein having an amino acid 

15 sequence as set forth in SEQ ID NO: 84. A method of producing a CPRB protein including an 
amino acid sequence as set forth in SEQ ID NO: 84 is provided which includes a) transforming a 
suitable host cell with a DNA sequence that encodes the protein having the amino acid sequence 
as set forth in SEQ ID NO: 84; and b) culturing the cell under conditions favoring the expression 
of the protein. 

20 An isolated nucleic acid is provided which encodes a CYP52A1A protein having 

the amino acid sequence set forth in SEQ ID NO: 95. An isolated nucleic acid is provided 
which includes a coding region defined by nucleotides 1 177-2748 as set forth in SEQ ID NO: 85. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 95. A vector is provided which includes a nucleotide sequence encoding CYP52A1A protein 

25 including an amino acid sequence as set forth in SEQ ID NO: 95. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A1A protein having an amino 
acid sequence as set forth in SEQ ID NO: 95. A method of producing a CYP52A1A protein 
including an amino acid sequence as set forth in SEQ ID NO: 95 is provided which includes a) 
transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 

30 acid sequence as set forth in SEQ ID NO: 95; and b) culturing the cell under conditions favoring 
the expression of the protein. 
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An isolated nucleic acid encoding a CYP52A2A protein is provided which has the 
amino acid sequence set forth in SEQ ID NO; 96* An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 1 199-2767 as set forth in SEQ ID NO: 86. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
5 96. A vector is provided which includes a nucleotide sequence encoding CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A2A protein having an amino 
acid sequence as set forth in SEQ ID NO: 96. A method of producing a CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96 is provided which includes a) 

10 transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 96; and b) culturing the cell under conditions favoring 
the expression of the protein. 

An isolated nucleic acid encoding a CYP52A2B protein is provided which has the 
amino acid sequence set forth in SEQ ID NO: 97. An isolated nucleic acid is provided which 

15 includes a coding region defined by nucleotides 1072-2640 as set forth in SEQ ID NO: 87. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
97. A vector is provided which includes a nucleotide sequence encoding CYP52A2B protein 
including an amino acid sequence as set forth in SEQ ID NO: 97. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A2B protein having an amino 

20 acid sequence as set forth in SEQ ID NO: 97. A method of producing a CYP52A2B protein 
including an amino acid sequence as set forth in SEQ ID NO: 97 is provided which includes a) 
transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 97; and b) culturing the cell under conditions favoring 
the expression of the protein. 

25 An isolated nucleic acid encoding a CYP52A3A protein is provided which has 

the amino acid sequence set forth in SEQ ID NO: 98. An isolated nucleic acid is provided 
which includes a coding region defined by nucleotides 1 126-2748 as set forth in SEQ ID NO: 88. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 98. A vector is provided which includes a nucleotide sequence encoding CYP52A3A 

30 protein including an amino acid sequence as set forth in SEQ ID NO: 98. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A3A protein having an 
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amino acid sequence as set forth in SEQ ID NO: 98. A method of producing a CYPS2A3A 
protein including an amino acid sequence as set forth in SEQ ID NO: 98 is provided which 
includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; and b) culturing the cell under 
5 conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYP52A3B protein is provided having the 
amino acid sequence as set forth in SEQ ID NO: 99. An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 913-2535 as set forth in SEQ ID NO: 89. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 

10 99. A vector is provided which includes a nucleotide sequence encoding CYP52A3B protein 
including an amino acid sequence as set forth in SEQ ID NO: 99. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A3B protein having an amino 
acid sequence as set forth in SEQ ID NO: 99. A method of producing a CYP52A3B protein 
including an amino acid sequence as set forth in SEQ ID NO: 99 is provided which includes a) 

15 transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 99; and b) culturing the cell under conditions favoring 
the expression of the protein. 

An isolated nucleic acid encoding a CYP52A5A protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 100. An isolated nucleic acid is provided which 

20 includes a coding region defined by nucleotides 1 103-2656 as set forth in SEQ ID NO: 90. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
100. A vector is provided which includes a nucleotide sequence encoding CYP52A5A protein 
including an amino acid sequence as set forth in SEQ ID NO: 100. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A5A protein having an 

25 amino acid sequence as set forth in SEQ ID NO: 100. A method of producing a CYP52A5A 
protein including an amino acid sequence as set forth in SEQ ID NO: 1 00 is provided which 
includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; and b) culturing the cell under 
conditions favoring the expression of the protein. 

30 An isolated nucleic acid encoding a CYP52A5B protein is provided having the 

amino acid sequence as set forth in SEQ ID NO: 101. An isolated nucleic acid is provided 
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which includes a coding region defined by nucleotides 1 142-2695 as set forth in SEQ ID NO: 91. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 101. A vector is provided which includes a nucleotide sequence encoding CYP52A5B 
protein including the amino acid sequence as set forth in SEQ ID NO: 101. A host cell is 
5 provided which is transfected or transformed with the nucleic acid encoding CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 101. A method of producing a 
CYP52A5B protein including an amino acid sequence as set forth in SEQ ID NO: 101 is 
provided which includes a) transforming a suitable host cell with a DNA sequence that encodes 
the protein having the amino acid sequence as set forth in SEQ ID NO: 101; and b) culturing the 

10 cell under conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYP52A8A protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 102. An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 464-2002 as set forth in SEQ ID NO; 92. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 

15 1 02. A vector is provided which includes a nucleotide sequence encoding CYP52A8A protein 
including an amino acid sequence as set forth in SEQ ID NO: 102. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYPS2A8A protein having an 
amino acid sequence as set forth in SEQ ID NO: 102. A method of producing a CYP52A8A 
protein including an amino acid sequence as set forth in SEQ ID NO: 102 is provided which 

20 includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 

having the amino acid sequence as set forth in SEQ ID NO: 102; and b) culturing the cell under 
conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYP52A8B protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 1 03. An isolated nucleic acid is provided which 

25 includes a coding region defined by nucleotides 1 017-2555 as set forth in SEQ ID NO: 93. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
103. A vector is provided which includes a nucleotide sequence encoding CYP52A8B protein 
including an amino acid sequence as set forth in SEQ ID NO: 103. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A8B protein having an 

30 amino acid sequence as set forth in SEQ ID NO: 103. A method of producing a CYP52A8B 
protein including an amino acid sequence as set forth in SEQ ID NO: 103 is provided which 
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includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; and b) culturing the cell under 
conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYPS2D4A protein is provided having the 
5 amino acid sequence set forth in SEQ ID NO: 104. An isolated nucleic acid is provided 

including a coding region defined by nucleotides 767-2266 as set forth in SEQ ID NO: 94. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
104, A vector is provided which includes a nucleotide sequence encoding CYP52D4A protein 
including an amino acid sequence as set forth in SEQ ID NO: 104. A host cell is provided 

10 which is transfected or transformed with the nucleic acid encoding CYP52D4A protein having an 
amino acid sequence as set forth in SEQ ID NO: 104. A method of producing a CYPS2D4A 
protein including an amino acid sequence as set forth in SEQ ID NO: 104 is provided which 
includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; and b) culturing the cell under 

15 conditions favoring the expression of the protein. 

A method for discriminating members of a gene family by quantifying the amount 
of target mRNA in a sample is provided which includes a) providing an organism containing a 
target gene; b) culturing the organism with an organic substrate which causes upregulation in the 
activity of the target gene; c) obtaining a sample of total RNA from the organism at a first point 

20 in time; d) combining at least a portion of the sample of the total RNA with a known amount of 
competitor RNA to form an RNA mixture, wherein the competitor RNA is substantially similar 
to the target mRNA but has a lesser number of nucleotides compared to the target mRNA; e) 
adding reverse transcriptase to the RNA mixture in a quantity sufficient to form corresponding 
target DNA and competitor DNA; (f) conducting a polymerase chain reaction in the presence of 

25 at least one primer specific for at least one substantially non-homologous region of the target 
DNA within the gene family, the primer also specific for the competitor DNA; g) repeating steps 
(c-f) using increasing amounts of the competitor RNA while maintaining a substantially constant 
amount of target RNA; h) determining the point at which the amount of target DNA is 
substantially equal to the amount of competitor DNA; i) quantifying the results by comparing the 

30 ratio of the concentration of unknown target to the known concentration of competitor; and j) 
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obtaining a sample of total RNA from the organism at another point in time and repeating steps 
(d-i). 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CPRA genes; b) 
5 increasing, in the host cell, the number of CPRA genes which encode a CPRA protein having the 
amino acid sequence as set forth in SEQ ID NO: 83; c) culturing the host cell in media 
containing an organic substrate which upregulates the CPRA gene, to effect increased production 
of dicarboxylic acid 

A method for increasing the production of a CPRA protein having an amino acid 

10 sequence as set forth in SEQ ID NO: 83 is provided which includes a) transforming a host cell 
having a naturally occurring amount of CPRA protein with an increased copy number of a CPRA 
gene that encodes the CPRA protein having the amino acid sequence as set forth in SEQ ID NO: 
83; and b) culturing the cell and thereby increasing expression of the protein compared with that 
of a host cell containing a naturally occurring copy number of the CPRA gene. 

15 A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CPRB genes; b) 
increasing, in the host cell, the number of CPRB genes which encode a CPRB protein having the 
amino acid sequence as set forth in SEQ ID NO: 84; c) culturing the host cell in media 
containing an organic substrate which upregulates the CPRB gene, to effect increased production 

20 of dicarboxylic acid. 

A method for increasing the production of a CPRB protein having an amino acid 
sequence as set forth in SEQ ID NO: 84 is provided which includes a) transforming a host cell 
having a naturally occurring amount of CPRB protein with an increased copy number of a CPRB 
gene that encodes the CPRB protein having the amino acid sequence as set forth in SEQ ID NO: 

25 84; and b) culturing the cell and thereby increasing expression of the protein compared with that 
of a host cell containing a naturally occurring copy number of the CPRB gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A1A genes; b) 
increasing, in the host cell, the number of CYP52A1A genes which encode a CYP52A1A protein 

30 having the amino acid sequence as set forth in SEQ ID NO: 95; c) culturing the host cell in media 
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containing an organic substrate which upregulates the CYP52A1A gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A1A protein having an amino 
acid sequence as set forth in SEQ ID NO: 95 is provided which includes a) transforming a host 
5 cell having a naturally occurring amount of CYP52A1A protein with an increased copy number of 
a CYP52A1A gene that encodes the CYP52AIA protein having the amino acid sequence as set 
forth in SEQ ID NO: 95; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A1A gene. 

10 A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CYP52A2A genes; b) 
increasing, in the host cell, the number of CYP52A2 } A genes which encode a CYP52A2A protein 
having the amino acid sequence as set forth in SEQ ID NO: 96; c) culturing the host cell in media 
containing an organic substrate which upregulates the CYP52A2A gene, to effect increased 

1 5 production of dicarboxylic acicL 

A method for increasing the production of a CYP52A2A protein having an amino 
acid sequence as set forth in SEQ ID NO: 96 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A2A protein with an increased copy number of 
a CYPS2A2A gene that encodes the CYP52A2A protein having the amino acid sequence as set 

20 forth in SEQ ID NO: 96; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A2A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A2B genes; b) 
25 increasing, in the host cell, the number of CYP52A2B genes which encode a CYP52A2B protein 
having the amino acid sequence as set forth in SEQ ID NO: 97; c) culturing the host cell in media 
containing an organic substrate which upregulates the CYP52A2B gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A2B protein having an amino 
30 acid sequence as set forth in SEQ ID NO: 97 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYPS2A2B protein with an increased copy number of 
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a CYP52A2B gene that encodes the CYP52A2B protein having the amino acid sequence as set 
forth in SEQ ID NO: 97; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A2B gene. 

5 A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CYP52A3A genes; b) 
increasing, in the host cell, the number of CYP52A3A genes which encode a CYP52A3A protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; c) culturing the host cell in media 
containing an organic substrate which upregulates CYP52A3A gene, to effect increased 

10 production of dicarboxylic acid. 

A method for increasing the production of a CYP52A3A protein having an amino 
acid sequence as set forth in SEQ ID NO: 98 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A3A protein with an increased copy number of 
a CYP52A3A gene that encodes the CYP52A3A protein having the amino acid sequence as set 

15 forth in SEQ ID NO: 98; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A3A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A3B genes; b) 

20 increasing, in the host cell, the number of CYP52A3B genes which encode a CYP52A3B protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; c) culturing the host cell in media 
containing an organic substrate which upregulates the CYP52A3B gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A3B protein having an amino 

25 acid sequence as set forth in SEQ ID NO: 99 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A3B protein with an increased copy number of 
a CYP52A3B gene that encodes the CYP52A3B protein having the amino acid sequence as set 
forth in SEQ ID NO: 99; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 

30 CYP52A3B gene. 
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A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A5A genes; b) 
increasing, in the host cell, the number of CYP52A5A genes which encode a CYP52A5A protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; c) culturing the host cell in 
5 media containing an organic substrate which upregulates the CYP52A5A gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A5A protein having an amino 
acid sequence as set forth in SEQ ID NO: 100 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52ASA protein with an increased copy number of 
10 a CYP52A5A gene that encodes the CYP52A5A protein having the amino acid sequence as set 
forth in SEQ ED NO: 100; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A5A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
15 includes a) providing a host cell having a naturally occurring number of CYP52A5B genes; b) 
increasing, in the host cell, the number of CYP52A5B genes which encode a CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 101; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52A5B gene, to effect increased 
production of dicarboxylic acid. 
20 A method for increasing the production of a CYP52A5B protein having an amino 

acid sequence as set forth in SEQ ID NO: 101 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A5B protein with an increased copy number of 
a CYP52ASB gene that encodes the CYP52A5B protein having the amino acid sequence as set 
forth in SEQ ID NO: 101; and b) culturing the cell and thereby increasing expression of the 
25 protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A5B gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A8A genes; b) 
increasing, in the host cell, the number of CYP52A8A genes which encode a CYP52A8A protein 
30 having the amino acid sequence as set forth in SEQ ID NO: 102; c) culturing the host cell in 
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media containing an organic substrate which upregulates the CYP52A8A gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A8A protein having an amino 
acid sequence as set forth in SEQ ID NO: 102 is provided which includes a) transforming a host 
5 cell having a naturally occurring amount of CYP52A8A protein with an increased copy number of 
a CYP52A8A gene that encodes the CYP52A8A protein having the amino acid sequence as set 
forth in SEQ ID NO: 102; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A8A gene. 

10 A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CYP52A8B genes; b) 
increasing, in the host cell, the number of CYP52A8B genes which encode a CYP52A8B protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52A8B gene, to effect increased 

15 production of dicarboxylic acid. 

A method for increasing the production of a CYP52A8B protein having an amino 
acid sequence as set forth in SEQ ID NO: 103 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A8B protein with an increased copy number of 
a CYP52A8B gene that encodes the CYP52A8B protein having the amino acid sequence as set 

20 forth in SEQ ID NO: 103; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A8B gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52D4A genes; b) 
25 increasing, in the host cell, the number of CYP52D4A genes which encode a CYP52D4A protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52D4A gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52D4A protein having an amino 
30 acid sequence as set forth in SEQ ID NO: 104 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52D4A protein with an increased copy number 
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of a CYP52D4A gene that encodes the CYP52D4A protein having the amino acid sequence as set 
forth in SEQ ID NO: 1 04; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52D4A gene. 

5 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic representation of cloning vector pTriplEx from 
Clontech™ Laboratories, Inc, Selected restriction sites within the multiple cloning site are 
shown. 

10 Figure 2 A is a map of the ZAP Express™ vector. 

Figure 2B is a schematic representation of cloning phagemid vector pBK-CMV. 
Figure 3 is a double stranded DNA sequence of a portion of the 5 prime coding 
region of the CYP52A5A gene (SEQ ID NO: 36). 

Figure 4 is a diagrammatic representation of highly conserved regions of CYP and 
15 CPR gene protein sequences. Helix I represents the putative substrate binding site and HR2 
represents the heme binding region. The FMN, FAD and NADPH binding regions are indicated 
below the CPR gene. 

Figure 5 is a diagrammatic representation of the plasmid pHKMl containing the 
truncated CPRA gene present in the pTriplEx vector. A detailed restriction map of only the 
20 sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 6 is a diagrammatic representation of the plasmid pHKM4 containing the 
truncated CPRA gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
25 transcription is indicated by an arrow under the open reading frame. 

Figure 7 is a diagrammatic representation of the plasmid pHKM9 containing the 
CPRB gene (SEQ ID NO: 82) present in the pBK-CMV vector. A detailed restriction map of 
only the sequenced region is shown at the top. The bar indicates the open reading frame. The 
direction of transcription is indicated by an arrow under the open reading frame. 
30 Figure 8 is a diagrammatic representation of the plasmid pHKMl 1 containing the 

CYPS2A1A gene (SEQ ID NO: 85) present in the pBK-CMV vector. A detailed restriction map 
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of only the sequenced region is shown at the top. The bar indicates the open reading frame. The 
direction of transcription is indicated by an arrow under the open reading frame. 

Figure 9 is a diagrammatic representation of the plasmid pHKM12 containing the 
CYP52A8A gene (SEQ ID NO: 92) present in the pBK-CMV vector. A detailed restriction map 
5 of only the sequenced region is shown at the top. The bar indicates the open reading frame. The 
direction of transcription is indicated by an arrow under the open reading frame. 

Figure 10 is a diagrammatic representation of the plasmid pHKM13 containing 
the CYP52D4A gene (SEQ ID NO: 94) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
10 The direction of transcription is indicated by an arrow under the open reading frame. 

Figure 1 1 is a diagrammatic representation of the plasmid pHKM14 containing 
the CYP52A2B gene (SEQ ID NO: 87) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
The direction of transcription is indicated by an arrow under the open reading frame. 
15 Figure 12 is a diagrammatic representation of the plasmid pHKMIS containing 

the CYP52A8B gene (SEQ ID NO: 93) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
The direction of transcription is indicated by an arrow under the open reading frame. 

Figures 13A-13D show the complete DNA sequences including regulatory and 
20 coding regions for the CPRA gene (SEQ ID NO: 81) and CPRB gene (SEQ ID NO: 82) from C. 
tropicalis klCC 20336. Figures 13 A- 1 3D show regulatory and coding region alignment of 
these sequences. Asterisks indicate conserved nucleotides. Bold indicates protein coding 
nucleotides; the start and stop codons are underlined. 

Figure 14 shows the amino acid sequence of the CPRA (SEQ ID NO: 83) and 
25 CPRB (SEQ ID NO: 84) proteins from C tropicalis ATCC 20336 and alignment of these amino 
acid sequences. Asterisks indicate residues which are not conserved. 

Figures 15A-15M show the complete DNA sequences including regulatory and 

^ 

coding regions for the following genes from C. tropicalis ATCC 20366: CYP52A1A (SEQ ID 
NO:. 85), CYP52A2A (SEQ ID NO: 86), CYP52A2B (SEQ ID NO: 87), CYPS2A3A (SEQ ID NO: 
30 88), CYPS2A3B (SEQ ID NO: 89), CYPS2ASA (SEQ ID NO. 90), CYP52A5B (SEQ ID NO: 91), 
CYP52A8A (SEQ ID NO: 92), CYP52A8B (SEQ ID NO: 93), and CYP52D4A (SEQ ID NO: 94). 
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Figures ISA- ISM show regulatory and coding region alignment of these sequences. Asterisks 
indicate conserved nucleotides. Bold indicates protein coding nucleotides; the start and stop 
codons are underlined. 

Figures 16A-16C show the amino acid sequences encoding the CYPS2A1A (SEQ 
5 ID NO: 95), CYP52A2A (SEQ ID NO: 96), CYPS2A2B (SEQ ID NO: 97), CYPS2A3A (SEQ ID 
NO: 98), CYP52A3B (SEQ ID NO: 99), CYPS2A5A (SEQ ID NO: 100), CYP52A5B (SEQ ID 
NO: 101), CYP52A8A (SEQ ID NO: 102), CYPS2A8B (SEQ ID NO: 103) and CYP52D4A (SEQ 
ID NO. 104) proteins from C. tropicalis ATCC 20336. Asterisks indicate identical residues and 
dots indicate conserved residues. 
10 Figure 1 7 is a diagrammatic representation of the pTAg PCR product cloning 

vector (commercially available from R&D Systems, Minneapolis, MN). 

Figure 18 is a plot of the log ratio (U/C) of unknown target DNA product to 
competitor DNA product versus the concentration of competitor mRNA. The plot is used to 
calculate the target messenger RNA concentration in a quantitative competitive reverse 
15 transcription polymerase chain reaction (QC-RT-PCR). 

Figure 19 is a graph showing the relative induction of C tropicalis ATCC 20962 
CYP52A5A (SEQ ID NO: 90) by the addition of the fatty acid substrate Emersol® 267 to the 
growth medium. 

Figure 20 is a graph showing the induction of C tropicalis ATCC 20962 CYP52 
20 and CPR genes by Emersol® 267. P450 genes CYP52A3A (SEQ ID NO: 88), CYP52A3B (SEQ 
ID NO: 89), and CYPS2D4A (SEQ ID NO: 94) are expressed at levels below the detection level 
of the QC-RT-PCR assay. 

Figure 21 is a scheme to integrate selected genes into the genome of Candida 
tropicalis strains and recovery of URA3A selectable marker. 
25 Figure 22 is a schematic representation of the transformation of C. tropicalis 

H5343 ura3' with CYP and/or CPR genes. Only one URA3 locus needs to be functional. There 
are a total of 6 possible ura3 targets (5ura3 A loci-2 pox4 disruptions, 2 pox 5 disruptions, 1 
uraiA locus; and 1 ura3B locus). 

Figure 23 is the complete DNA sequence (SEQ ID NO: 105) encoding URA3A 
30 from C tropicalis ATCC 20336 and the amino acid sequence of the encoded protein (SEQ ID 
NO: 106), 
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Figure 24 is a schematic representation of the plasmid pURAin, the base vector 
for integrating selected genes into the genome of C. tropicalis. The detailed construction of 
pURAin is described in the text. 

Figure 25 is a schematic representation of the plasmid pNEB 1 93 cloning vector 
5 (commercially available from New England Biolabs, Beverly, MA). 

Figure 26 is a diagrammatic representation of the plasmid pPAl 5 containing the 
truncated CYP52A2A gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 
10 Figure 27 is a schematic representation of pURA2in, the base vector is 

constructed in pNEB 193 which contains the 8 bp recognition sequences for Asc I, Pac I and Pme 
L URA3A (SEQ ID NO: 105) and CYP52A2A (SEQ ID NO: 86) do not contain these 8 bp 
recognition sites. URA3A is inverted so that the transforming fragment will attempt to 
recircularize prior to integration. An Asc J/Pme I fragment was used to transform H5343 ura\ 
15 Figure 28 shows a scheme to detect integration of CYP52A2A gene (SEQ ID NO: 

86) into the genome of H5343 ura*. In all cases, hybridization band intensity could reflect the 
number of integrations. 

Figure 29 is a diagrammatic representation of the plasmid pPA57 containing the 
truncated CYP52A3A gene present in the pTriplEx vector. A detailed restriction map of only the 
20 sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 30 is a diagrammatic representation of the plasmid pPA62 containing the 
truncated CYP52A3B gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
25 transcription is indicated by an arrow under the open reading frame. 

Figure 3 1 is a diagrammatic representation of the plasmid pPAL3 containing the 
truncated CYPS2ASA gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 
30 Figure 32 is a diagrammatic representation of the plasmid pPA5 containing the 

truncated CYP52A5A gene present in the pTriplEx vector. A detailed restriction map of only the 
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sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 33 is a diagrammatic representation of the plasmid pPAl 8 containing the 
truncated CYPS2D4A gene present in the pTriplEx vector. A detailed restriction map of only the 
5 sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 34 is a graph showing the expression of CYP52A1 (SEQ ID NO: 85), 
CYP52A2 (SEQ ID NO: 86) and CYP52A5 genes (SEQ ID NOS: 90 and 91) from C tropicalis 
20962 in a fermentor run upon the addition of amounts of the substrate oleic acid or tridecane in 
10 a spiking experiment 

Figure 35 depicts a scheme used for the extraction and analysis of diacids and 
monoacids from fermentation broths. 

Figure 36 is a graph showing the induction of expression of CYP52A1A, 
CYP52A2A and CYP52A5A in a fermentor run upon addition of the substrate octadecane. No 
15 induction of CYP52A3A or CYP52A3B was observed under these conditions. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Diacid productivity is improved according to the present invention by selectively 
increasing enzymes which are known to be important to the oxidation of organic substrates such 

20 as fatty acids composing the desired feed. According to the present invention, ten CYP genes 
and two CPR genes of C tropicalis have been identified and characterized that relate to 
participation in the to-hydroxylase complex catalyzing the first step in the o-oxidation pathway. 
In addition, a novel quantitative competitive reverse transcription polymerase chain reaction 
(QC-RT-PCR) assay is used to measure gene expression in the fermentor under conditions of 

25 induction by one or more organic substrates as defined herein. Based upon QC-RT-PCR results, 
three CYP genes, CYP 52 Ah CYP52A2 and CYP52A5, have been identified as being of greater 
importance for the co-oxidation of long chain fatty acids. Amplification of the CPR gene copy 
number improves productivity. The QC-RT-PCR assay indicates that both CYP and CPR genes 
appear to be under tight regulatory control. 

30 In accordance with the present invention, a method for discriminating members of 

a gene family by quantifying the amount of target mRNA in a sample is provided which 
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includes a) providing an organism containing a target gene; b) culturing the organism with an 
organic substrate which causes upregulation in the activity of the target gene; c) obtaining a 
sample of total RNA from the organism at a first point in time; d) combining at least a portion of 
the sample of the total RNA with a known amount of competitor RNA to form an RNA mixture, 
5 wherein the competitor RNA is substantially similar to the target mRNA but has a lesser number 
of nucleotides compared to the target mRNA; e) adding reverse transcriptase to the RNA mixture 
in a quantity sufficient to form corresponding target DNA and competitor DNA; (f) conducting a 
polymerase chain reaction in the presence of at least one primer specific for at least one 
substantially non-homologous region of the target DNA within the gene family, the primer also 

10 specific for the competitor DNA; g) repeating steps (c-f) using increasing amounts of the 
competitor RNA while maintaining a substantially constant amount of target RNA; h) 
determining the point at which the amount of target DNA is substantially equal to the amount of 
competitor DNA; i) quantifying the results by comparing the ratio of the concentration of 
unknown target to the known concentration of competitor; and j) obtaining a sample of total 

15 RNA from the organism at another point in time and repeating steps (d-i). 

In addition, modification of existing promoters and/or the isolation of alternative 
promoters provides increased expression of CYP and CPR genes. Strong promoters are obtained 
from at least four sources: random or specific modifications of the CYPS2A2 promoter, 
CYP52A5 promoter, CYP52A1 promoter, the selection of a strong promoter from available 

20 Candida ^oxidation genes such as POX4 and POX5, or screening to select another suitable 
Candida promoter. 

Promoter strength can be directly measured using QT-RT-PCR to measure CYP 
and CPR gene expression in Candida cells isolated from fermentors. Enzymatic assays and 
antibodies specific for CYP and CPR proteins are used to verify that increased promoter strength 

25 is reflected by increased synthesis of the corresponding enzymes. Once a suitable promoter is 
identified, it is fused to the selected CYP and CPR genes and introduced into Candida for 
construction of a new improved production strain. It is contemplated that the coding region of 
the CYP and CPR genes can be fused to suitable promoters or other regulatory sequences which 
are well known to those skilled in the art 

30 In accordance with the present invention, studies on C. tropicalis ATCC 20336 

have identified six unique CYP genes and four potential alleles. QC-RT-PCR analyses of cells 
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isolated during the course of the fermentation byconversions indicate that at least three of the 
CYP genes are induced by fatty acids and at least two of the CYP genes are induced by alkanes. 
See Figure 34. Two of the CYP genes are highly induced indicating participation in the co- 
hydroxylase complex which catalyzes the rate limiting step in the oxidation of fatty acids to the 
5 corresponding diacids. 

The biochemical characterizations of each P450 enzyme herein is used to tailor 
the C. tropicalis host for optimal diacid productivity and is used to select P450 enzymes to be 
amplified based upon the fatty acid content of the feedstream. CYP gene(s) encoding P450 
enzymes that have a low specific activity for the fatty acid or alkane substrate of choice are 

10 targeted for inactivation, thereby reducing the physiological load on the cell. 

Since it has been demonstrated that CPR can be limiting in yeast systems, the 
removal of non-essential P450s from the system can free electrons that are being used by non- 
essential P450s and make them available to the P450s important for diacid productivity. 
Moreover, the removal of non-essential P450s can make available other necessary but potentially 

15 limiting components of the P450 system (i.e., available membrane space, heme and/or NADPH). 

Diacid productivity is thus improved by selective integration, amplification, and 
over expression of CYP and CPR genes in the C tropicalis production host 

It should be understood that host cells into which one or more copies of desired 
CYP and/or CPR genes have been introduced can be made to include such genes by any 

20 technique known to those skilled in the art. For example, suitable host cells include procaryotes 
such as Bacillus sp. f Pseudomous sp. 9 Actinomycetes sp. % Eschericia sp., Mycobacterium sp., and 
eukaryotes such as yeast, algae, insect cells, plant cells and and filamentous fungi. Suitable host 
cells are preferably yeast cells such as Yarrowia, Bebaromyces, Saccharomyces, 
Schizosaccharomyces, and Pichia and more preferably those of the Candida genus. Preferred 

25 species of Candida are tropicalis, maltosa y apicola,paratropicalis % albicans, cloacae, 

guillermondii, intermedia, lipolytica t parapsilosis and zeylenoides. Certain preferred stains of 
Candida tropicalis are listed in U.S. Patent No. 5,254,466, incorporated herein by reference. 

Vectors such as plasmids, phagemids, phages or cosmids can be used to transform 
or transfect suitable host cells. Host cells may also be transformed by introducing into a cell a 

30 linear DNA vector(s) containing the desired gene sequence. Such linear DNA may be 

advantageous when it is desirable to avoid introduction of non-native (foreign) DNA into the 
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cell. For example, DNA consisting of a desired target gene(s) flanked by DNA sequences which 
are native to the cell can be introduced into the cell by electroporation, lithium acetate 
transformation, spheroplasting and the like. Flanking DNA sequences can include selectable 
markers and/or other tools for genetic engineering. 
5 A suitable organic substrate herein can be any organic compound that is 

biooxidizable to a mono- or polycarboxylic acid. Such a compound can be any saturated or 
unsaturated aliphatic compound or any carbocyclic or heterocyclic aromatic compound having at 
least one terminal methyl group, a terminal carboxyl group and/or a terminal functional group 
which is oxidizable to a carboxyl group by biooxidation. A terminal functional group which is a 

10 derivative of a carboxyl group may be present in the substrate molecule and may be converted to 
a carboxyl group by a reaction other than biooxidation. For example, if the terminal group is an 
ester that neither the wild-type C. tropicalis nor the genetic modifications described herein will 
allow hydrolysis of the ester functionality to a carboxyl group, then a lipase can be added during 
the fermentation step to liberate free fatty acids. Suitable organic substrates include, but are not 

15 limited to, saturated fatty acids, unsaturated fatty acids, alkanes, alkenes, alkynes and 
combinations thereof. 

Alkanes are a type of saturated organic substrate which are useful herein. The 
alkanes can be linear or cyclic, branched or straight chain, substituted or unsubstituted. 
Particularly preferred alkanes are those having from about 4 to about 25 carbon atoms, examples 

20 of which include but are not limited to butane, hexane, octane, nonane, dodecane, tridecane, 
tetradecane, octadecane and the like. 

Examples of unsaturated organic substrates which can be used herein include but 
are not limited to internal olefins such as 2-pentene, 2-hexene, 3-hexene, 9-octadecene and the 
like; unsaturated carboxylic acids such as 2-hexenoic acid and esters thereof, oleic acid and esters 

25 thereof including triglyceiyl esters having a relatively high oleic acid content, erucic acid and 
esters thereof including triglyceryl esters having a relatively high erucic acid content, ricinoleic 
acid and esters thereof including triglyceryl esters having a relatively high ricinoleic acid content, 
linoleic acid and esters thereof including triglyceryl esters having a relatively high linoleic acid 
content; unsaturated alcohols such as 3-hexen-l-oI, 9-octadecen-l-ol and the like; unsaturated 

30 aldehydes such as 3-hexen-l-al, 9-octadecen-l-al and the like. In addition to the above, an 
organic substrate which can be used herein include alicyclic compounds having at least one 
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internal carbon-carbon double bond and at least one terminal methyl group, a terminal carboxyl 
group and/or a terminal functional group which is oxidizable to a carboxyl group by 
biooxidation. Examples of such compounds include but are not limited to 3,6-dimethyl, 1,4- 
cyclohexadiene; 3-methylcyclohexene; 3-methyl-l, 4-cyclohexadiene and the like. 
5 Examples of the aromatic compounds that can be used herein include but are not 

limited to arenes such as o-, m-, p-xylene; o-, m-, p-methyl benzoic acid; dimethyl pyridine, and 
the like. The organic substrate can also contain other functional groups that are biooxidizable to 
carboxyl groups such as an aldehyde or alcohol group. The organic substrate can also contain 
other functional groups that are not biooxidizable to carboxyl groups and do not interfere with 

10 the biooxidation such as halogens, ethers, and the like. 

Examples of saturated fatty acids which may be applied to cells incorporating the 
present CYP and CPR genes include caproic, enanthic, caprylic, pelargonic, capric, undecylic, 
lauric, myristic, pentadecanoic, palmitic, margaric, stearic, arachidic, behenic acids and 
combinations thereof. Examples of unsaturated fatty acids which may be applied to cells 

15 incorporating the present CYP and CPR genes include palmitoleic, oleic, erucic, linoleic, 
linolenic acids and combinations thereof. Alkanes and fractions of alkanes may be applied 
which include chain links from C12 to C24 in any combination. An example of a preferred fatty 
acid mixtures are Emersol® 267 and Tallow, both commercially available from Henkel 
Chemicals Group, Cincinnati, OH. The typical fatty acid composition of Emersol® 267 and 

20 Tallow is as follows: 

TALLOW E267 
C14:0 3.5% 2.4% 
CI4:1 1.0% 0.7% 
C15:0 0.5% 



25 CI 6:0 25.5% 4.6% 

C16:l 4.0% 5.7% 

CI 7:0 2.5% 

CI 7:1 5.7% 

C18.0 19.5% 1.0% 

30 C18.1 41.0% 69.9% 

CI 8:2 2.5% 8.8% 
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CI 8:3 
C20:0 
C20:l 



0J% 



0.5% 



0.9% 



The following examples are meant to illustrate but not to limit the invention. All 
relevant microbial strains and plasmids are described in Table 1 and Table 2, respectively. 
Table 1. List of Escherichia coli and Candida tropicalis strains 



10 



15 



20 



25 



KColi 
STRAIN 


GENOTYPE 


SOURCE 


XLlBlue- 
MRF 


endAU gyrA96, hsdRl7 t lac', recAl, 
relAl, supE44, th\-L [F lacFZMlS, 
proAB, Tn/0] 


Stratagene, La Jolla, CA 


BM25.8 


SupE44. thi (lac-proAB) [F traD36, 

proAB*, lacPZ M15] 

\imm434 (kwfi)Pl (cam 11 ) hsdR (r ktr 


Clontech, Palo Alto, CA 


XLOLR 


(mcrA)183 (mcrCB-hsdSMR~mrr)173 
endAl thi- J recAl gyrA96 relAl lac 
\F'proAB lacPZ MIS Tn/0 (Tef) Su" 
(nonsuppressing V(lambda resistant) 


Stratagene, La Jolla, CA 



C tropicalis 
STRAIN 



ATCC20336 
ATCC750 



ATCC 20962 



H5343 ura- 



HDC1 



HDC5 



HDC10 



GENOTYPE 



Wild-type 
Wild-type 



ura3A/ura3B t 

pax4A::vra3A/pax4B::ura3A, 
pox5::ura3A/pox5::URA3A 

vra3AJura3B, 
pox4A::ura3A/pox4B::ura3A t 
pax5::vra3A/pax5::URA3A, ura3- 
ura3A/ura3B, 

pax4A::vra3A/pox4B::ura3A t 
pox5::ura3A/pax5:: URA3A, 
ura3:: URA3A-CYP52A2A 



ura3A/ura3B, 

pox4A::ura3A/pox4B::vra3A t 
pox5::ura3A/pax5:: URA3A, 
ura3::URA3A-CYP52A3A 



ura3A/ura3B, 

pax4A::vra3A/pax4B::wa3A, 
poxS::ura3A/pox5::URA3A, 
ura3::URA3A-CPRB 



American Type Culture 
Collection, Rockviile, MP 
American Type Culture 
Collection, Rockviile, MP 



Henkel 



Henkel 



Henkel 
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HDCI5 


ura3AJura3B % 

pax4A::ura3A/pox4B::ura3A, 
pax5;: ura3A/pox5:: URA3A, 
ura3:: URA3A-CYP52ASA 


Henkel 


HDC20 


ura3A/ura3B, 

pox4A::ura3A/pax4B::ura3A, 
pox5::ura3A/pax5::URA3A t 
ura3::URA3A-CYP52A2A + CPRB 
(CYP and CPR have opposite 5' to 3 1 
orientation with respect to each 
other) 


Henkel 


HDC23 


ura3A/ura33 t 

pox4A::ura3A/pox4B::ura3A, 
pax5::ura3AJpox5:: URA3A, 
wra3::URA3A-CYP52A2A + CPRB 
(CYP and CPR have same 5* to 3' 
orientation with respect to each 
other) 


Henkel 



Table 2. List of plasmids isolated from genomic libraries and constructed for use 
in gene integrations. 



Plasmid 


Base 
vector 


Insert 


Insert 
Size 


Plasmid 
size 


Description 


pURAin 


pNEB193 


URA3A 


1706 bp 


4399 bp 


pNEB 193 with the URA 3 A gene 
inserted in the Ascl • Pmel site, 
generating a Pacl site 


pURA2in 


pURAin 


CYP52A2A 


2230 bp 


6629 bp 


pURAin containing a PCR 
CYP52A2A allele containing 
Pacl restriction sites 


pURA 
REDBin 


pURAin 


CPRB 


3266 bp 


7665 bp 


pURAin containing a PCR 
CPRB allele containing Pacl 
restriction sites 


pHKMl 


pTriplEx 


Truncated 
CPRA gene 


Approx. 
3.8 kb 


Approx. 
7.4 kb 


A truncated CPRA gene 
obtained by first screening 
library containing the 5* 
untranslated region and 1.2 kb 
open reading frame 


pHKM4 


PTriplEx 


Truncated 
CPRA gene 


Approx, 
5kb 


Approx. 
8.6 kb 


A truncated CPRA gene 
obtained by screening second 
library containing the 3' 
untranslated region end 
sequence 


pHKM9 


pBC- 
CMV 


CPRB 
gene 


Approx. 
5.3 kb 


Approx. 
9.8 kb 


CPRB allele isolated from the 
third library 


pHKMil 


pBC- 
CMV 


CYP 52 Al A 


Approx. 
5kb 


Approx. 
9.5 kb 


CYP52A1A isolated from the 
third library 


pHKM12 


pBC- 
CMV 


CYP52A8A 


Approx. 
7.5 kb 


Approx. 
12 kb 


CYP52A8A isolated from the 
third library 


pHKM13 


pBC- 
CMV 


CYP52D4A 


Approx. 
7.3 kb 


Approx. 
11.8 kb 


CYP52D4A isolated from the 
third library 
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pHKM14 


pBC- 
CMV 


CYP52A2B 


Approx. 
6kb 


Approx. 
10.5 kb 


CYP52A2B isolated from the 
third library 


pHKM15 


pBC- 
CMV 


CYP52A8B 


Approx. 
6,6 kb 


Approx. 
11.1 kb 


CYP52A8B isolated from the 
third library 


pPAL3 


pTripiEx 


CYP52A5A 


4.4 kb 


Approx. 
S.l kb 


CYP52A5A isolated from the 1st 
library 


pPA5 


pTriplEx 


CYP52A5B 


4.1 kb 


Approx. 
7.8 kb 


CYP52A5B isolated from the 
2nd library 


pPA15 


pTripiEx 


CYP52A2A 


6.0 kb 


Approx. 
9.7 kb 


CYP52A2A isolated from the 
2nd library 


pPA57 


pTriplEx 


CYP52A3A 


5.5 kb 


Approx. 
9.2 kb 


CYP52A3A isolated from the 
2nd library 


pPA62 


pTripiEx 


CYP52A3B 


6.0 kb 


Approx. 
9.7 kb 


CYP52A3B isolated from the 
2nd library 



EXAMPLE 1 

10 Purification of Genomic DNA from Candida tropicalis ATCC 20336 

A* Construction of Genomic Libraries 
50 ml of YEPD broth (see Chart) was inoculated with a single colony of C 
tropicalis 20336 from YEPD agar plate and grown overnight at 30°C. 5 ml of the overnight 
culture was inoculated into 100 ml of fresh YEPD broth and incubated at 30 °C for 4 to 5 hr with 

15 shaking. Cells were harvested by centrifugation, washed twice with sterile distilled water and 
resuspended in 4 ml of spheroplasting buffer (1 M Sorbitol, 50 mM EDTA, 14 mM 
mercaptoethanol) and incubated for 30 min at 37°C with gentle shaking. 0,5 ml of 2 mg/ml 
zymolyase (ICN Pharmaceuticals, Inc., Irvine, CA) was added and incubated at 37°C with gentle 
shaking for 30 to 60 min. Spheroplast formation was monitored by SDS lysis. Spheroplasts 

20 were harvested by brief centrifugation (4,000 rpm, 3 min) and were washed once with the 

spheroplast buffer without mercaptoethanol. Harvested spheroplasts were then suspended in 4 
ml of lysis buffer (0.2 M Tris/pH 8.0, 50 mM EDTA, 1% SDS) containing 100 ng/ml RNase 
(Qiagen Inc., Chatsworth, CA) and incubated at 37°C for 30 to 60 min. 

Proteins were denatured and extracted twice with an equal volume of 

25 chloroform/isoamyl alcohol (24:1) by gently mixing the two phases by hand inversions. The two 
phases were separated by centrifugation at 10,000 rpm for 10 min and the aqueous phase 
containing the high-molecular weight DNA was recovered. To the aqueous layer NaCl was 
added to a final concentration of 0.2 M and the DNA was precipitated by adding 2 vol of ethanoL 
Precipitated DNA was spooled with a clean glass rod and resuspended in TE buffer (10 mM 
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Tris/pH 8.0, 1 raM EDTA) and allowed to dissolve overnight at 4°C. To the dissolved DNA, 
RNase free of any DNase activity (Qiagen Inc., Chatsworth, CA) was added to a final 
concentration of 50 |ig/ml and incubated at 37°C for 30 min. Then protease (Qiagen Inc., 
Chatsworth, CA) was added to a final concentration of 100 ng/ml and incubated at 55 to 60°C 
5 for 30 min. The solution was extracted once with an equal volume of phenol/chloroform/isoamyl 
alcohol (25:24:1) and once with equal volume of chloroform/isoamyi alcohol (24:1). To the 
aqueous phase 0.1 vol of 3 M sodium acetate and 2 volumes of ice cold ethanol (200 proof) were 
added and the high molecular weight DNA was spooled with a glass rod and dissolved in 1 to 2 
ml of TE buffer. 

10 

B. Genomic DNA Preparation for FCR 
Amplification otCYP and CPR Genes 

Five 5 ml of YPD medium was inoculated with a single colony and grown at 
30°C overnight. The culture was centrifuged for 5 min at 1200 x g. The supernatant was 

15 removed by aspiration and 0.5 ml of a sorbitol solution (0.9 M sorbitol, 0.1 M Tris-Cl pH 8,0, 
0.1 M EDTA) was added to the pellet. The pellet was resuspended by vortexing and 1 nl of 2* 
mercaptoethanol and 50 \i\ of a 10 jig/ml zymolyase solution were added to the mixture. The 
tube was incubated at 37°C for 1 hr on a rotary shaker (200 rpm). The tube was then 
centrifuged for 5 min at 1200 x g and the supernatant was removed by aspiration. The protoplast 

20 pellet was resuspended in 0.5 ml 1 x TE (1 0 mM Tris-Cl pH 8.0, 1 mM EDTA) and transferred 
to a 1.5 ml microcentrifuge tube. The protoplasts were lysed by the addition of 50 jil 10% SDS 
followed by incubation at 65 °C for 20 min. Next, 200 |il of 5M potassium acetate was added 
and after mixing, the tube was incubated on ice for at least 30 min. Cellular debris was removed 
by centrifugation at 13,000 x g for 5 min. The supernatant was carefully removed and 

25 transferred to a new microfuge tube. The DNA was precipitated by the addition of 1 ml 100% 
(200 proof) ethanol followed by centrifugation for 5 min at 13,000 x g. The DNA pellet was 
washed with 1 ml 70 % ethanol followed by centrifugation for 5 min at 13,000 x g. After 
partially drying the DNA under a vacuum, it was resuspended in 200 nl of lx TE. The DNA 
concentration was determined by ratio of the absorbance at 260 nm / 280 nm (A 2 6oaso )• 

30 
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EXAMPLE 2 

Construction of Candida tropicalis 20336 Genomic Libraries 
Three genomic libraries of C tropicalis were constructed, two at Clontech 
Laboratories, Inc., (Palo Alto, CA) and one at Henkel Corporation (Cincinnati, OH). 

5 

A. Clontech Libraries 

The first Clontech library was made as follows: Genomic DNA was prepared 
from G tropicalis 20336 as described above, partially digested with EcdKL and size fractionated 
by gel electrophoresis to eliminate fragments smaller than 0.6 kb. Following size fractionation, 

10 several ligations of the £coRI genomic DNA fragments and lambda (X) TriplEx™ vector (Figure 
1) arms with £coRI sticky ends were packaged into X phage heads under conditions designed 
to obtain one million independent clones. The second genomic library was constructed as 
follows: Genomic DNA was digested partially with Sau3Al and size fractionated by gel 
electrophoresis. The DNA fragments were blunt ended using standard protocols as described, 

15 e.g., in Sambrook et al, Molecular Cloning: A Laboratory Manual, 2ed. Cold Spring Harbor 
Press, USA (1989), incorporated herein by reference. The strategy was to fill in the Sau3Al 
overhangs with Klenow polymerase (Life Technologies, Grand Island, NY) followed by 
digestion with SI nuclease (Life Technologies, Grand Island, NY). After SI nuclease digestion 
the fragments were end filled one more time with Klenow polymerase to obtain the final blunt- 

20 ended DNA fragments. £coRI linkers were ligated to these blunt-ended DNA fragments 

followed by ligation into the XTriplEx vector. The resultant library contained approximately 2 X 
10 6 independent clones with an average insert size of 4.5 kb. 

B. Henkel Library 

25 The third genomic library was constructed at Henkel Corporation using XZAP 

Express™ vector (Stratagene, La Jolla, CA) (Figure 2). Genomic DNA was partially digested 
with Sau3Al and fragments in the range of 6 to 12 kb were purified from an agarose gel after 
electrophoresis of the digested DNA. These DNA fragments were then ligated to BamHI 
digested XZAP Express™ vector arms according to manufacturers protocols. Three ligations 

30 were set up to obtain approximately 9.8 X 10 3 independent clones. All three libraries were 
pooled and amplified according to manufacturer instructions to obtain high-titre (>10 9 plaque 
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forming units/ml) stock for long-term storage. The titre of packaged phage library was 
ascertained after infection of E. coli XLlBlue-MRF* cells. E. coli XLlBlue-MRF* were grown 
overnight in either in LB medium or NZCYM (Chart) containing 10 mM MgSO< and 0.2% 
maltoseat 37°C or30°C, respectively with shaking. Cells were then centrifiiged and 
5 resuspended in 0.5 to 1 volume of 10 mM MgS0 4 . 200 ^1 of this R coli culture was mixed with 
several dilutions of packaged phage library and incubated at 37°C for 15 min. To this mixture 
2.5 ml of LB top agarose or NZCYM top agarose (maintained at 60°C ) (see Chart) was added 
and plated on LB agar or NCZYM agar (see Chart) present in 82 mm petri dishes. Phage were 
allowed to propagate overnight at 37°C to obtain discrete plaques and the phage titre was 
10 determined. 



EXAMPLE 3 
Screening of Genomic Libraries 
Both ATriplEx™ and XZA? Express™ vectors are phagemid vectors that can be 

15 propagated either as phage or plasmid DNA (after conversion of phage to plasmid). Therefore, 
the genomic libraries constructed in these vectors can be screened either by plaque hybridization 
(screening of lambda form of library) or by colony hybridization (screening plasmid form of 
library after phage to plasmid conversion). Both vectors are capable of expressing the cloned 
genes and the main difference is the mechanism of excision of plasmid from the phage DNA. 

20 The cloning site in XTriplEx™ is located within a plasmid which is present in the phage and is 
flanked by loxP site (Figure 1). When XTriplEx™ is introduced into R coli strain BM25.8 
(supplied by Clontech), the Cre recombinase present in BM25.8 promotes the excision and 
circularization of plasmid pTriplEx from the phage ATriplEx™ at the loxP sites. The 
mechanism of excision of plasmid pBK-CMV from phage AZAP Express™ is different It 

25 requires the assistance of a helper phage such as ExAssist™ (Stratagene) and an E. coli strain 
such as XLOR (Stratagene). Both pTriplEx and pBK-CMVcan replicate autonomously in £. 
coli. 
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A* Screening Genomic Libraries (Plasmid Form) 
1) Colony Lifts 

A single colony of £ coli BM2S.8 was inoculated into 5 ml of LB containing 50 
Hg/ml kanamycin, 1 0 mM MgS0 4 and 0. 1 % maltose and grown overnight at 3 1 °C, 250 rpm. To 
5 200 pi of this overnight culture (~ 4 X 1 0 8 cells) 1 fil of phage library (2 - 5 X 1 0 6 plaque 
forming units) and 150 \il LB broth were added and incubated at 3 1 °C for 30 min after which 
400 ^1 of LB broth was added and incubated at 31 °C , 225 rpm for 1 h. This bacterial culture 
was diluted and plated on LB agar containing 50 ng/ml ampicillin (Sigma Chemical Company, 
St. Louis, MO) and kanamycin (Sigma Chemical Company) to obtain 500 to 600 colonies/plate. 

10 The plates were incubated at 37°C for 6 to 7 hrs until the colonies became visible. The plates 
were then stored at 4°C for L5 h before placing a Colony/Plaque Screen™ Hybridization 
Transfer Membrane disc (DuPont NEN Research Products, Boston, MA) on the plate in contact 
with bacterial colonies. The transfer of colonies to the membrane was allowed to proceed for 3 to 
5 min. The membrane was then lifted and placed on a fresh LB agar (see Chart) plate containing 

15 200 ng/ml of chloramphenicol with the side exposed to the bacterial colonies facing up. The 
plates containing the membranes were then incubated at 37°C overnight in order to allow full 
development of the bacterial colonies. The LB agar plates from which colonies were initially 
lifted were incubated at 37°C overnight and stored at 4°C for future use. The following 
morning the membranes containing bacterial colonies were lifted and placed on two sheets of 

20 Whatman 3M (Whatman, Hillsboro, OR) paper saturated with 0.5 N NaOH and left at room 
temperature (RT) for 3 to 6 min to lyse the cells. Additional treatment of membranes was as 
described in the protocol provided by NEN Research Products. 



2) DNA Hybridizations 

25 Membranes were dried overnight before hybridizing to oligonucleotide probes 

prepared using a non-radioactive ECL™ 3'-oligolabelling and detection system from Amersham 
Life Sciences (Arlington Heights, IL). DNA labeling, prehybridization and hybridizations were 
performed according to manufacturer's protocols. After hybridization, membranes were washed 
twice at room temperature in 5 X SSC, 0.1% SDS (in a volume equivalent to 2 ml/cm 2 of 

30 membrane) for 5 rain each followed by two washes at 50 °C in IX SSC, 0.1% SDS (in a volume 
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equivalent to 2 ml/cm 2 of membrane) for 15 min each. The hybridization signal was then 
generated and detected with Hyperfilm ECL™ (Amersham) according to manufacturer's 
protocols. Membranes were aligned to plates containing bacterial colonies from which colony 
lifts were performed and colonies corresponding to positive signals on X-ray were then isolated 
and propagated in LB broth. Plasmid DNA's were isolated from these cultures and analyzed by 
restriction enzyme digestions and by DNA sequencing. 

B. Screening Genomic Libraries (Plaque Form) 

1) X Library Plating 

E. coli XLlBlue-MRF cells were grown overnight in LB medium (25 ml) 
containing 10 mM MgS0 4 and 0.2% maltose at 37°C, 250 rpm. Cells were then centrifuged 
(2,200 x g for 10 min) and resuspended in 0.5 volumes of 10 mM MgSCv 500 ul of this E coli 
culture was mixed with a phage suspension containing 25,000 amplified lambda phage particles 
and incubated at 37°C for 15 min. To this mixture 6.5 ml of NZCYM top agarose (maintained at 
60°C) (see Chart) was added and plated on 80-100 ml NCZYM agar (see Chart) present in a 
150 mm petridish. Phage were allowed to propagate overnight at 37°C to obtain discrete 
plaques. After overnight growth plates were stored in a refrigerator for 1-2 hr before plaque lifts 
were performed. 

2) Plaque Lift and DNA Hybridizations 

Magna Lift™ nylon membranes (Micron Separations, Inc., Westborough, MA) 
were placed on the agar surface in complete contact with X plaques and transfer of plaques to 
nylon membranes was allowed to proceed for 5 min at RT. After plaque transfer the membrane 
was placed on 2 sheets of Whatman 3M™ (Whatman, Hillsboro, OR) filter paper saturated with 
a 0.5 N NaOH, 1.0 M NaCl solution and left for 10 min at RT to denature DNA. Excess 
denaturing solution was removed by blotting briefly on dry Whatman 3M paper. Membranes 
were then transferred to 2 sheets of Whatman 3M™ paper saturated with 0.5 M Tris-HCl (pH 
8.0), 1.5 M NaCl and left for 5 min to neutralize. Membranes were then briefly washed in 200 - 
500 ml of 2 X SSC, dried by air and baked for 30 - 40 min at 80°C. The membranes were then 
probed with labelled DNA. 
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Membranes were prewashed with a 200 - 500 ml solution of 5 X SSC, 0.5% SDS, 
1 mM EDTA (pH 8.0) for 1 - 2 hr at 42 °C with shaking (60 rpm) to get rid of bacterial debris 
from the membranes. The membranes were prehybridized for 1 - 2 hr at 42 °C with (in a volume 

equivalent to 0.125 - 0.25 ml/cm 2 of membrane) ECL Gold™ buffer (Amersham) containing 0.5 

5 M NaCl and 5% blocking reagent DNA fragments that were used as probes were purified from 
agarose gel using a QIAEX II™ gel extraction kit (Qiagen Inc., Chatsworth, CA) according to 
manufacturers protocol and labeled using an Amersham ECL™ direct nucleic acid labeling kit 
(Amersham). Labeled DNA (5-10 ng/ml hybridization solution) was added to the prehybridized 
membranes and the hybridization was allowed to proceed overnight. The following day 

10 membranes were washed with shaking (60 rpm) twice at 42°C for 20 min each time in (in a 

volume equivalent to 2 ml/cm 3 of membrane) a buffer containing either 0. 1 (high stringency) or 
0.5 (low stringency) X SSC, 0.4% SDS and 360 g/1 urea. This was followed by two 5 min 
washes at room temperature in (in a volume equivalent to 2 ml/cm 2 of membrane) 2 X SSC. 
Hybridization signals were generated using the ECL™ nucleic acid detection reagent and 

15 detected using Hyperfilm ECL™ (Amersham). 

Agar plugs which contained plaques corresponding to positive signals on the X- 
ray film were taken from the master plates using the broad-end of Pasteur pipet. Plaques were 
selected by aligning the plates with the x-ray film. At this stage, multiple plaques were generally 
taken. Phage particles were eluted from the agar plugs by soaking in 1 ml SM buffer (Sambrook 

20 et al„ supra) overnight The phage eluate was then diluted and plated with freshly grown & coli 
XLlBlue-MRF cells to obtain 100 - 500 plaques per 85 mm NCZYM agar plate. Plaques were 
transferred to Magna Lift nylon membranes as before and probed again using the same probe. 
Single well-isolated plaques corresponding to signals on X - ray film were picked by removing 
agar plugs and eluting the phage by soaking overnight in 0.5 ml SM buffer. 

25 

C. Conversion of X Clones to Plasmid Form 

The lambda clones isolated were converted to plasmid form for further analysis. 
Conversion from the plaque to the plasmid form was accomplished by infecting the plaques into 
E. coli strain BM25.8. The £ coli strain was grown overnight at 3 1 °C, 250 rpm in LB broth 
30 containing 1 0 mM MgSO< and 0.2% maltose until the OD^ reached 1.1-1.4. Ten milliliters of 
the vernight culture was removed and mixed with 100 (il of 1 M MgCl 2 . A 200 jil volume of 
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cells was removed, mixed with ISO |il of eluted phage suspension and incubated at 3 1 °C for 30 
rain. LB broth (400 jil) was added to the tube and incubation was continued at 31 °C for 1 hr 
with shaking, 250 rpm. 1 - 10 |xl of the infected cell suspension was plated on LB agar 
containing 100 \igfnd ampicillin (Sigma, St Louis, MO). Well-isolated colonies were picked 
5 and grown overnight in 5 ml LB broth containing 100 |ig/ml ampicillin at 37°C, 250 rpm. 
Plasmid DNA was isolated from these cultures and analyzed. To convert the XLAP Express™ 
vector to plasmid form E. coli strains XLlBlue-MRF' and XLOR were used. The conversion 
was performed according to the manufacturer's (Stratagene) protocols for single-plaque 
excision. 

10 

EXAMPLE 4 
Transformation of C tropicalis H5343 ura" 
A. Transformation of C tropicalis H5343 by Electroporation 

5 ml of YEPD was inoculated with C. tropicalis H5343 ura- from a frozen 

15 stock and incubated overnight on a New Brunswick shaker at 30°C and 170 rpm. The next day, 
10 jil of the overnight culture was inoculated into 100 ml YEPD and growth was continued at 
30°C, 1 70 rpm. The following day the cells were harvested at an OD^ of 1.0 and the cell 
pellet was washed one time with sterile ice-cold water. The cells were resuspended in ice-cold 
sterile 35 % Polyethylene glycol (4,000 MW) to a density of 5x10 s cells/ml. A 0.1 ml volume of 

20 cells were utilized for each electroporation. The following electroporation protocol was 

followed: 1.0 ^g of transforming DNA was added to 0.1 ml cells, along with 5 \x% denatured, 
sheared calf thymus DNA and the mixture was allowed to incubate on ice for 15 min. The cell 
solution was then transferred to an ice-cold 0.2 cm electroporation cuvette, tapped to make sure 
the solution was on the bottom of the cuvette and electroporated. The cells were electroporated 

25 using an Invitrogen electroporator (Carlsbad, CA) at 450 Volts, 200 Ohms and 250 /xF. 

Following electroporation, 0.9 ml SOS media (1M Sorbitol, 30% YEPD, 10 mM CaCl 2 ) was 
added to the suspension. The resulting culture was grown for 1 hr at 30 °C, 170 rpm. Following 
the incubation, the cells were pelleted by centrifugation at 1500 x g for 5 min. The 
electroporated cells were resuspended in 0.2 ml of 1M sorbitol and plated on synthetic complete 

30 media minus uracil (SC - uracil) (Nelson, supra). In some cases the electroporated cells were 
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plated directly onto SC - uracil. Growth of transformants was monitored for 5 days. After three 
days, several transformants were picked and transferred to SC-uracil plates for genomic DNA 
preparation and screening. 



procedures described in Current Protocols in Molecular Biology, Supplement 5, 13.7.1 (1989), 
incorporated herein by reference. 



10 and incubated overnight on a New Brunswick shaker at 30°C and 170 rpm. The next day, 10 jil 
of the overnight culture was inoculated into 50 ml YEPD and growth was continued at 30°C, 170 
rpm. The following day the cells were harvested at an OD^ of 1.0. The culture was transferred 
to a 50 ml polypropylene tube and centrifuged at 1000 X g for 10 min. The cell pellet was 
resuspended in 10 ml sterile TE (lOmM Tris-Cl and ImM EDTA, pH 8.0). The cells were again 

15 centrifuged at 1000 X g for 10 min and the cell pellet was resuspended in 10 ml of a sterile 
lithium acetate solution [LiAc ( 0.1 M lithium acetate, 10 mM Tris-Cl, pH 8.0, 1 mM EDTA)]. 
Following centrifiigation at 1000 X g for 10 min., the pellet was resuspended in 0 3 ml LiAc. 
This solution was incubated for one hour at 30°C while shaking gently at 50 rpm. A 0.1 ml 
aliquot of this suspension was incubated with 5 \ig of transforming DNA at 30 °C with no 

20 shaking for 30 min. A 0.7 ml PEG solution (40 % wt/vol polyethylene glycol 3340, 0.1 M 

lithium acetate, 10 mM Tris-CI, pH 8.0, 1 mM EDTA) was added and incubated at 30°C for 45 
min. The tubes were then placed at 42 °C for 5 min. A 0.2 ml aliquot was plated on synthetic 
complete media minus uracil (SC - uracil) (Kaiser et al. Methods in Yeast Genetics* Cold Spring 
Harbor Laboratory Press, USA, 1994, incorporated herein by reference). Growth of 

25 transformants was monitored for 5 days. After three days, several transformants were picked and 
transferred to SC-uracil plates for genomic DNA preparation and screening. 



5 



B. Transformation of C tropicalis Using Lithium Acetate 

The following protocol was used to transform C. tropicalis in accordance with the 



5 ml of YEPD was inoculated with C. tropicalis H5343 ura* from a frozen stock 



EXAMPLE 5 



Plasmid DNA Isolation 



30 



Plasmid DNA were isolated from JE. coli cultures using Qiagen plasmid isolation 
kit (Qiagen Inc., Chatsworth, CA) according to manufacturer's instructions. 
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EXAMPLE 6 
DNA Sequencing and Analysis 

DNA sequencing was performed at Sequetech Corporation (Mountain View, CA) 
using Applied Biosystems automated sequencer (Perkin Elmer, Foster City, CA). DNA 
sequences were analyzed with Mac Vector and GeneWorks software packages (Oxford Molecular 
Group, Campbell, CA). 

EXAMPLE 7 
PCR Protocols 

PCR amplification was carried out in a Perkin Elmer Thermocycler using the 
AmpliJa^GoId enzyme (Perkin Elmer Cetus, Foster City, CA) kit according to manufacturer's 
specifications. Following successful amplification, in some cases, the products were digested 
with the appropriate enzymes and gel purified using QiaexII (Qiagen, Chatsworth, CA) as per 
manufacturer instructions. In specific cases the Ultma Taq polymerase (Perkin Elmer Cetus, 
Foster City, CA) or the Expand Hi-Fi Taq polymerase (Boehringer Mannheim, Indianapolis, IN) 
were used per manufacturer's recommendations or as defined in Table 3. 



Table 3, PCR amplification conditions used with different primer combinations. 



PRIMER 
COMBINATION 


Taq 


TEMPLATE 
DENATURING 
CONDITION 


ANNEALING 
TEMP/TIME 


EXTENSION 
TEMP/TIME 


CYCLE 
Number 


3674-41-1/ 41-2/41-4 
+ 3674-41-4 


Ampii- 
Taq Gold 


94 C/30 sec 


55 C/30 sec 


72 C/l min 


30 


URA Primer la 
URA Primer lb 


Ampli- 
Taq Gold 


95 C/i min 


70 C/l min 


72 C/2 min 


35 


URA Primer 2a 
URA Primer 2b 


Ampli- 
Taq Gold 


95 C/l min 


70 C/l min 


72 C/2 min 


35 


cika#i 

CZP2A#2 


Ampli- 
fy Gold 


95 C/l min 


70 C/i min 


72 C/2 min 


35 


CZP3A#1 
CYP3AM2 


Ultma Taq 


95 C/l min 


70 C/l min 


72 C/l min 


30 


CPRM\ 
CPR BU2 


Expand 

Hi-Fi 

Taq 


94 C/l 5 sec 
94 C/l 5 sec 


50 C/30 sec 
50 C/30 sec 


68 C/3 min 
68 C/3 min 
+20 sec/cycle 


10 
15 
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CYP5 AMI 


Expand 


94C/15s c 


50 C/30 sec 


68 C/3 min 


10 


CYP5AM2 


Hi-Fi 


94C/15sec 


50 C/30 sec 


68 C/3 min 


15 




Tag 






+20 sec/cycle 





Table 4 below contains a list of primers (SEQ ID NOS: 1-35) used for PCR amplification to 
5 construct gene integration vectors or to generate probes for gene detection and isolation. 

Table 4, Primer table for PCR amplification to construct gene integration vectors, to generate 
probes for gene isolation and detection and to obtain DNA sequence of constructs. (A- 
deoxyadenosine triphosphate [dATP], G- deoxyguanosine triphosphate [dGTP], C- 
10 deoxycytosine triphosphate [dCTP], T- deoxythymidine triphosphate [dTTPJ, Y- dCTP or dTTP, 
R- dATP or dGTP, W- dATP or dTTP, M- dATP or dCTP, N- dATP or dCTP or dGTP or 
dTTP). 



15 



20 



25 



30 



Target 
genets) 


Patent 
Primer 
Name 


Lab 
Primer 
Name 


Sequence (5* to 3') 


PCR 
Product Size 












CYP52A2A 


CYP2A31 


3659-72M 


CCTTAA 7T/14 ATGC ACG AAGCGG AG A 

TAAAAG 

(SEQ ID NO: 1) 


2230 bp 




CYP2A#2 


3659-72N 


CCTTAA 7TX4GCATAAGCTTGCTCG AG 
TCT 

(SEQ ID NO: 2) 














CYPS2A3A 


CYP3A#i 


3659-720 


CCTTAA 77X4ACGC AATGGGAACATG 

GAGTG 

(SEQ ID NO: 3) 


2154 bp 




CYP3A#2 


3659-72P 


CCTT/L4 TT/L4TCGCACTACGGTTATTG 

GTATCAG 

(SEQ ID NO: 4) 














CYPS2A5A 


CYP3A31 


3659-72K 


CCTTAA 77X4TC AAAGTACGTTC AGGC 
GG 

(SEQ ID NO: 5) 


3298 bp 




CYP5A32 


3659-72L 


COT/L47TX4GGCAGACAACAACTTG 

GCAAAGTC 

(SEQ ID NO: 6) 














CPRB 


CPRB#1 


3698-20A 


CC7T/M7TX4GAGGTCGTTGGTTGAGT 
TTTC 

(SEQ ID NO: 7) 


3266 bp 




CPRB#2 


3698-20B 


CC7T/L47TyL4TTGATAATGACGTTGCG 
GG 

(SEQ ID NO: 8) 














URA3A 


URA Primer 
la 


3698-7C 


^GGCGCCCCGGAGTCCAAAAAGACC 

AACCTCTG 

(SEQ ID NO: 9) 


956 bp 




URA Primer 
lb 


3698-7D 


CC7TX4 7TA4TACGTGG ATACCTTCAA 

GCAAGTG 

(SEQ ID NO: 10) 
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URA3A 


URA Primer 
2a 


3698-7A 


CCTTAA7TAAGCTCACGAGTTTTGGCA 

ITllCGAG 

(SEQIDNO: 11) 


750 bp 




URA Primer 
2b 


3698-7B 


GGOrjTAAACCGCAGAGGVrGGTCYT 

TTTGGACTC 

(SEQIDNO: 12) 




















GGGTTTAAAC - Pme I restriction site 
(SEQ ID NO: 13) 










AGGCGCGCC - Ascl restriction site 
(SEQ ID NO: 14) 










CCTTAATTAA - Pad restriction site 
(SEQIDNO: 15) 














CPR 


FMN1 


3674-41-1 


TCYCAAACWGGTACWGCWGAA 
(SEQIDNO: 16) 




CPR 


FMN2 


3674-4N2 


GGTTTGGGTAAYTCWACTTAT 
(SEQIDNO: 17) 




CPR 


FAD 


3674-41-3 


CGTTATTAYTCyATTTCTTC 
(SEQIDNO: 18) 




CPR 


NADPH 


3674-41-4 


GCMACACCRGTACCTGGACC 
(SEQ ID NO: 19) 




CPR 


PRKI.F3 


PRK1.F3 


ATCCCAATCGTAATCAGC 
(SEQ ID NO: 20) 




CPR 


PRKKF5 


PRK1.F5 


ACTTGTCTTCGTTTAGCA 
(SEQ ID NO: 21) 




CPR 


PRK4.R20 


PRK4.R20 


CTACGTCTGTGGTGATGC 
(SEQ ID NO: 22) 




CYP 


UCupl 


UCupl 


CGNGAYACNACNGCNGG 
(SEQIDNO: 23) 




CYP 


UCup2 


UCup2 


AGRGA Y ACNACNGCNG G 
(SEQ ID NO: 24) 




CYP 


UCdownl 


UCdownl 


AGNGCRAAYTGYTGNCC 
(SEQIDNO: 25) 




CYP 


UCdown2 


UCdown2 


YAANGCRAAYTGYTGNCC 
(SEQIDNO: 26) 




CYP 


HemeBI 


HemeBI 


ATTCAACGGTGGTCCAAGAATCTGTT 
TGG 

(SEQ ID NO: 27) 




CYP 


2A5P 


23.5P 


GAGCTATGTTGAGACCACAGTTTGC 
(SEQ ID NO: 28) 




CYP 


2.3.5M 


W t 5M 


CTTCAGTTAAAGCAAATTGTTTGGCC 
(SEQ ID NO: 29) 




pTriplEx 
vector 


Triplcxi* 


Triplex* * 


CTCGGGAAGCGCGCCATTGTGTTGG 
(SEQ ID NO: 30) 




pTriplEx 
vector 


Triplex3' 


Triplcx3' 


TAATACGACTCACTATAGGGCGAAT 
TGGC 

(SEQIDNO: 31) 




CYP 


Cyp52a 


Cyp52a 


TGRYTCAAACCATCTYTCTGG 
(SEQ ID NO: 32) 




CYP 


Cyp52b 


Cyp52b 


GGACCGGCGTTAAAGGG 
(SEQ ID NO: 33) 




CYP 


Cyp52c 


Cyp52c 


CATAGTCGWATYATGCTTAGACC 
(SEQIDNO: 34) 




CYP 


Cyp52d 


Cyp52d 


GGACCACCATTGAATGG 
(SEQ ID NO: 35) 





-39- 



WO 00/20566 PCTAJS99/20797 

Yeast Colony PCR Procedure for Confirmation of Gene 
Integration into the Genome of G troplcalis 

5 Single yeast colonies were removed from the surface of transformation plates, 

suspended in 50 }A of spheroplasting buffer (50mM KCI, lOmM Tris-HCI, pH 8.3, 1.0 mg/ml 
Zymolyase, 5% glycerol) and incubated at 37 °C for 30 min. Following incubation, the solution 
was heated for 10 min at 95 °C to lyse the cells. Five pi of this solution was used as a template in 
PCR. Expand Hi-Fi Taq polymerase (Boehringer Mannheim, Indianapolis, IN) was used in PCR 
10 coupled with a gene-specific primer (gene to be integrated) and a URA3 primer. If integration 
did occur, amplification would yield a PCR product of predicted size confirming the presence of 
an integrated gene. 



EXAMPLE 9 

15 Fermentation Method for Gene Induction Studies 

A fennentor was charged with a semi-synthetic growth medium having the 
composition 75 g/1 glucose (anhydrous), 6.7 g/1 Yeast Nitrogen Base (Difco Laboratories), 3 g/1 
yeast extract, 3 g/1 ammonium sulfate, 2 g/1 monopotassium phosphate, 0.5 g/1 sodium chloride. 
Components were made as concentrated solutions for autoclaving then added to the fermentor 

20 upon cooling: final pH approximately 5.2. This charge was inoculated with 5-10% of an 

overnight culture of C. tropicalis ATCC 20962 prepared in YM medium (Difco Laboratories) as 
described in the methods of Examples 17 and 20 of US Patent 5,254,466, which is incorporated 
herein by reference. C. tropicalis ATCC 20962 is a POX 4 and POX 5 disrupted C. tropicalis 
ATCC 20336. Air and agitation were supplied to maintain the dissolved oxygen at greater than 

25 about 40% of saturation versus air. The pH was maintained at about 5.0 to 8.5 by the addition of 
5N caustic soda on pH control. Both a fatty acid feedstream (commercial oleic acid in this 
example) having a typical composition: 2.4% C u ; 0.7% C M j; 4.6% C (6 ; 5.7% C !6:l ; 5.7% C n: i; 
1.0% C l8 ; 69.9% C| 8:I ; 8.8% C I8:2 ; 0.30% C m . 0.90% C^ and a glucose co-substrate feed were 
added in a feedbatch mode beginning near the end of exponential growth. Caustic was added on 

30 pH control during the bioconversion of fatty acids to diacids to maintain the pH in the desired 
range. Typically, samples for gene induction studies were collected just prior to starting the fatty 
acid feed and over the first 10 hours of bioconversion. Determination of fatty acid and diacid 
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content was determined by a standard methyl ester protocol using gas liquid chromatography 
(GLC). Gene induction was measured using the QC-RT-PCR protocol described in this 
application. 

5 EXAMPLE 10 

RNA Preparation 

The first step of this protocol involves the isolation of total cellular RNA from 
cultures of C. tropicalis. The cellular RNA was isolated using the Qiagen RNeasy Mini Kit 
(Qiagen Inc., Chatsworth, CA) as follows: 2 ml samples of C tropicalis cultures were collected 

10 from the fermentor in a standard 2 ml screw capped Eppendorf style tubes at various times before 
and after the addition of the fatty acid or alkane substrate. Cell samples were immediately frozen 
in liquid nitrogen or a dry-ice/alcohol bath after their harvesting from the fermentor. To isolate 
total RNA from the samples, the tubes were allowed to thaw on ice and the cells pelleted by 
centrifugation in a microfuge for 5 minutes (min) at 4°C and the supernatant was discarded while 

15 keeping the pellet ice-cold. The microfuge tubes were filled 2/3 full with ice-cold Zirconia/Silica 
beads (0.5 mm diameter, Biospec Products, Bartlesville, OK) and the tube filled to the top with 
ice-cold RLT* lysis buffer (* buffer included with the Qiagen RNeasy Mini Kit). Cell rupture 
was achieved by placing the samples in a mini bead beater (Biospec Products, Bartlesville, OK) 
and immediately homogenized at full speed for 2.5 min. The samples were allowed to cool in a 

20 ice water bath for 1 minute and the homogenization/cool process repeated two more times for a 
total of 7.5 min homogenization time in the beadbeater. The homogenized cells samples were 
microfuged at full speed for 10 min and 700 |il of the RNA containing supernatant removed and 
transferred to a new eppendorf tube. 700 |il of 70% ethanol was added to each sample followed 
by mixing by inversion. This and all subsequent steps were performed at room temperature. 

25 Seven hundred microliters of each ethanol treated sample were transferred to a Qiagen RNeasy 
spin column, followed by centrifugation at 8,000 x g for 1 5 sec. The flow through was 
discarded and the column reloaded with the remaining sample (700 \x\) and re-centrifuged at 
8,000 x g for 15 sec. The column was washed once with 700 |il of buffer RW1 *, and 
centrifiiged at 8,000 x g for 1 5 sec and the flow through discarded. The column was placed in a 

30 new 2 ml collection tube and washed with 500 jil of RPE* buffer and the flow through discarded. 
The RPE* wash was repeated with centrifugation at 8,000 x g for 2 min and the flow through 
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discarded. The spin column was transferred to a new 1 .5 ml collection tube and 1 00 fil of RNase 
free water added to the column followed by centrifugation at 8,000 x g for 15 seconds. An 
additional 75 fil of RNase free water was added to the column followed by centrifugation at 
8,000 x g for 2 min. RNA eluted in the water flow through was collected for farther purification. 
5 The RNA eluate was then treated to remove contaminating DNA. Twenty 

microliters of 10X DNase I buffer (0.5 M tris (pH 7.5), 50 mM CaCI 2 , 100 mM MgClj), 10 yd of 
RNase-free DNase I (2 Units/^1, Ambion Inc., Austin, Texas) and 40 units Rnasin (Promega 
Corporation, Madison, Wisconsin) were added to the RNA sample. The mixture was then 
incubated at 37°C for 15 to 30 min. Samples were placed on ice and 250 fil Lysis buffer RLT* 

10 and 250 jal ethanol (200 proof) added. The samples were then mixed by inversion. The samples 
were transferred to Qiagen RNeasy spin columns and centrifuged at 8,000 x g for 1 5 sec and the 
flow through discarded. Columns were placed in new 2 ml collection tubes and washed twice 
with 500 pi of RPE* wash buffer and the flow through discarded. Columns were transferred to 
new 1 .5 ml eppendorf tubes and RNA was eluated by the addition of 100 jil of DEPC treated 

15 water followed by centrifugation at 8,000 x g for 1 5 sec. Residual RNA was collected by adding 
an additional 50 |il of RNase free water to the spin column followed by centrifugation at full 
speed for 2 min. 1 0 pi of the RNA preparation was removed and quantified by the (A 2 6<y280 ) 
method. RNA was stored at 

-70°C. Yields were found to be 30-100 |ig total RNA per 2.0 ml of fermentation broth. 

20 

EXAMPLE 11 

Quantitative Competitive Reverse Transcription Polymerase 
Chain Reaction (QC-RT-PCR) Protocol 

25 QC-RT-PCR is a technique used to quantitate the amount of a specific RNA in a 

RNA sample. This technique employs the synthesis of a specific DNA molecule that is 
complementary to an RNA molecule in the original sample by reverse transcription and its 
subsequent amplification by polymerase chain reaction. By the addition of various amounts of a 
competitor RNA molecule to the sample one can determine the concentration of the RNA 

30 molecule of interest (in this case the mRNA transcripts of the CYP and CPR genes). The levels 
of specific mRNA transcripts were assayed over time in response to the addition of fatty acid 
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and/or alkane substrates to the growth medium of fermentation grown C. tropicalis cultures for 
the identification and characterization of the genes involved in the oxidation of these substrates. 
This approach can be used to identify the CYP and CPR genes involved in the oxidation of any 
given substrate based upon their transcriptional regulation. 

5 

A. Primer Design 

The first requirement for QC-RT-PCR is the design of the primer pairs to be used 
in the reverse transcription and subsequent PCR reactions. These primers need to be unique and 
specific to the gene of interest As there is a family of genetically similar CYP genes present in 

10 C. tropicalis 20336, care had to be taken to design primer pairs that would be discriminating and 
only amplify the gene of interest, in this example the CYP52A5 gene. In this manner, unique 
primers directed to substantially non-homologous (aka variable) regions within target members 
of a gene family are constructed. What constitutes substantially non-homologous regions is 
determined on a case by case basis. Such unique primers should be specific enough to anneal the 

15 non-homologous region of the target gene without annealing to other non-target members of the 
gene family. By comparing the known sequences of the members of a gene family, non- 
homologous regions are identified and unique primers are constructed which will anneal to those 
regions. It is contemplated that non-homologous regions herein would typically exhibit less than 
about 85% homology but can be more homologous depending on the positions which are 

20 conserved and stringency of the reaction. After conducting PCR, it may be helpful to check the 
reaction product to assure it represents the unique target gene product. If not, the reaction 
conditions can be altered in terms of stringency to focus the reaction to the desired target. 
Alternatively a new primer or new non-homologous region can be chosen. Due to the high level 
of homology between the genes of the CYP52A family, the most variable 5 prime region of the 

25 CYPS2A5 coding sequence was targeted for the design of the primer pairs. In Figure 3, a portion 
of the 5 prime coding region for the CYPS2ASA (SEQ ID NO: 36) allele of C tropicalis 20336 is 
shown. The boxed sequences in Figure 3 are the sequences of the forward and backwards 
primers (SEQ ID NOS: 47 and 48) used to quantitate expression of both alleles of this gene. The 
actual reverse primer (SEQ ID NO: 48) contains one less adenine than that shown in Figure 3. 

30 Primers used to measure the expression of specific C. tropicalis 20336 genes using the QC-RT- 
PCR protocol are listed in Table 5 (SEQ ID NOS: 37-58). 
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Table 5. Primer used to measure C tropicalis gene expression in 
the QC-RT-PCR reactions. 



Primer 
Name 


Direction 


Target 


Sequence 


3737-89F 


F 


CYP52A1A 


CCGATGAAGTTTTCGACGAGTACCC 
(SEQ ID NO: 37) 


3737-89B 


B 


CYP52A1A 


AAGGCTTTAACGTGTCCAATCTGGTC 
(SEQ ID NO: 38) 


alk2aFl 


F 


CYPS2A2A 


ATTATCGCCACATaCITCACCAAATGG 
(SEQ ID NO: 39) 


aik2aB5 


B 


CYP52A2A 


CGAGATCGTGGATACGCTGGAGTG 
(SEQ ID NO: 40) 


7581-178-3 


F 


CYP52A3A 


GCCACTCGGTAACnTGTCAGGGAC 
(SEQ ID NO: 41) 


7581-178-4 


B 


CYP52A3A 


CATTGAACTGAGTAGCCAAAACAGCC 
(SEQ ID NO: 42) 


3737-50F 


F 


CYP52A3A 
it 

CYP52A3B 


CCTACGTTTGGTATCGCTACTCCGTTG 
(SEQ ID NO: 43) 


3737-50B 


B 


CYP52A3A 
& 

CYP52A3B 


TTTCCAGCCAGCACCGTCCAAG 
(SEQ ID NO: 44) 


3737-1 75F 


F 


CYP52D4A 


GCAGAGCCGATCTATGTTGCGTCC 
(SEQ ID NO: 45) 


3737-175B 


B 


CYP52D4A 


TCATTGAATGCTTCCAGGAACCTCG 
(SEQ ID NO: 46) 


7581-97-F 


F 


CYP52A5A& 
CYP52A5B 


AAGAGGGCAGGGCTCAAGAG 
(SEQ ID NO: 47) 


7581-97-M 


B 


CYP52A5A& 
CYP52A5B 


TCCATGTGAAGATCCCATCAC 
(SEQ ID NO: 48) 


4P-2 


F 


CYP52A8A 


CTTGAAGGCCGTGTTGAACG 
(SEQ ID NO: 49) 


4M-1 


B 


CYP52A8A 


CAGGATTTGTCTGAGTTGCCG 
(SEQ ID NO: 50) 


3737-52F 


F 


POX4A & 
POX4B 


CCATTGCCTTGAGATACGCCATTGGTAG 
(SEQ ID NO: 51) 


3737-52B 


B 


POX4A &. 
POX4B 


AGCCTTGGTGTCGTTCTTTTCAACG G 
(SEQ ID NO: 52) 


3737-53F 


F 


POX5A 


TTGGGTTTGTTTGTTTCCTGTGTCCG 

/CCA TT"\ Vt/V C1\ 

(b£Q ID NO: 53) 


3737-53B 


B 


POX5A 


CCTTTGACCTTCAATCTGGCGTAGACG 
(SEQ ID NO: 54) 


F33 


F 


CPRA 


GGTTTGCTGAATACGCTGAAGGTGATG 
(SEQ ID NO: 55) 


B63 


B 


CPRA 


TGGAGCTGAACAACTCTCTCGTCTCGG 
(SEQ ID NO: 56) 


3737-133F 


F 


CPRA & 
CPRB 


TTCCTCAACACGGACAGCGG 
(SEQ ID NO: 57) 


3737-133B 


B 


CPRA & 
CPRB 


AGTCAACCAGGTGTGGAACTCGTC 
(SEQ ID NO: 58) 



F»Forward B s Backwaid 
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B. Design and Synthesis of the Competit r DNA Template 

The competitor RNA is synthesized in vitro from a competitor DNA template that 
has the T7 polymerase promoter and preferably carries a small deletion of e.g,, about 1 0 to 25 
nucleotides relative to the native target RNA sequence. The DNA template for the in-vitro 
5 synthesis of the competitor RNA is synthesized using PCR primers that are between 46 and 60 
nucleotides in length. In this example, the primer pairs for the synthesis of the CYP52A5 
competitor DNA are shown in Tables 6 and 7 (SEQ ID NOS: 59 AND 60). 



10 Table 6. Forward and Reverse primers used to synthesize the competitor RNA template for 
the QC-RT-PCR measurement of CYP52A5 A gene expression. 



Forward Primer 


CYP52A5A 


GGATCCTAATACGACTCACTATAGGGAGGA 

AGAGGGCAGGGCTCAAGAG 

(SEQ ID NO: 59) 


Reverse Primer 


CYP52A5A 


TCCATGTGAAGATCCCATCACGAGTGTGCC 

TCTTGCCCAAAG 

(SEQ ID NO: 60) 



15 



Table 7. Primers for the synthesis of the QC-RT-PCR competitor RNA templates 



Primer 
Name 


Direction 


Target 


Sequence 5'-3* 


3737-89C 


F 


CYP52A1A 


GGATCCTAATACGACTCACTATAGGGAGGCCGATG 

AAGTTTTCGACGAGTACCC 

(SEQ ID NO: 61) 


3737-89D 


B 


CYPS2A1A 


AAUGCTJTAACGTGJ CCAA'i CTGGTC 
AAC ATAGCTCTGG AGTG CTTCCAACC 
(SEQ ID NO: 62) 


758M37-A 


F 


CYP52A2A 


GGATCCTAATACGACTCACTATAGGGAGGATTATC 

GCCACATACTTCACCAAATGG 

(SEQ ID NO: 63) 


758M37-B 


B 


CYP52A2A 


CGAGATCGTGGATACGCTGGAGTGCGTCGCTCTTC 

TTCTTCAACAATTCAAG 

(SEQ ID NO: 64) 


758M37-D 


B 


CYPS2A3A 


CATTGAACTGAGTAGCCAAAACAGCCCATGGTTTC 

AATCAATGGGAGGC 

(SEQ ID NO: 65) 


7581-137-C 


F 


CYP52A3A 


GGATCCTAATACGACTCACTATAGGGAGGGCCACT 

CGGTAACTTTGTCAGGGAC 

(SEQ ID NO: 66) 
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3737-50-D 


F 


CYP52AU 
& 

CYP52A3B 


GGATCCTAATACGACTCACTATAGGGAGGCCTACO 

TTTGGTATCGCTACTCCGTTG 

(SEQ ID NO: 67) 


3737-50-C 


B 


CYP52A3A 
& 

CYP52A3B 


TTTCCAGCCAGCACCGTCCAAGCAACAAGGAGTAC 

AAGAAATCGTGTC 

(SEQ ID NO: 68) 


3737-1 75C 


F 


CYP52D4A 


GGATCCTAATACGACTCACTATAGGGAGGGCAGAG 

CCGATCTATGTTGCGTCC 

(SEQ ID NO: 69) 


3737-175D 


B 


CYP52D4A 


TCATTGAATGCTTCCAGGAACCTCGCCACATCCATC 

GAGAACCGG 

(SEQ ID NO: 70) 


7581-97-A 


F 


CYP52A5A 
& 

CYP52A5B 


GGATCCTAATACGACTCACTATAGGGAGGAAGAGG 

GCAGGGCTCAAGAG 

(SEQ ID NO: 59) 


7581-97-B 


B 


CYP52A5A 
& 

CYP52ASB 


TCCATGTGAAGATCCCATCACGAGTGTGCCTCTTGC 

CCAAAG 

(SEQ ID NO: 60) 


4P-2/T7 


F 


CYP52A8A 


GGATCCTAATACGACTCACTATAGGGAGGCTTGAA 

GGCCGTGTTGAACG 

(SEQ ID NO: 71) 


4M-3/4M-1 


B 


CYP52A8A 


CAGGATTTGTCTGAGTTGCCGCCTGATCAAGATAG 

GATCCTTGCCG 

(SEQ ID NO: 72) 


3737-26-D 


F 


CPRA 


GGATCCTAATACGACTCACTATAGGGAGGGGTTTG 

CTGAATACGCTGAAGGTGATG 

(SEQ ID NO: 73) 


3737-26-C 


B 


CPRA 


TGGAGCTGAACAACTCTCTCGTCTCG GGTGGTCGA 

ATGGACCCTTGGTCAAG 

(SEQ ID NO: 74) 


3737-133C 


F 


CPRA & 
CPRB 


GGATCCTAATACGACTCACTATAGGGAGGTTCCTC 

AACACGGACAGCGG 

(SEQ ID NO: 75) 


3737-133D 


B 


CPRA& 
CPRB 


AGTCAACCAGGTGTGGAACTCGTCGGTGGCAACAA 

TGAAAAACACCAAG 

(SEQ ID NO: 76) 


3737-52-C 


F 


POX4A & 
POX4B 


GGATCCTAATACGACTCACTATAGGGAGGCCATTG 

CCTTGAGATACGCCATTGGTAG 

(SEQ ID NO: 77) 


3737-52-D 


B 


POX4A & 
POX4B 


AGCCTTGGTGTCGTTCTTTTCAACGGAAGGTGGTCT 

CGATGGTGTGTTCAACC 

(SEQ ID NO: 78) 


3737-53-C 


F 


POX5A 


GGATCCTAATACGACTCACTATAGGGAGGTTGGGT 

TTGTTTGTTTCCTGTGTCCG 

(SEQ ID NO: 79) 


3737-53-D 


B 


POX5A 


CCTTTGACCTTCAATCTGGCGTAGACGCAGCACCA 

CCGATCCACCACTTG 

(SEQ ID NO: 80) 



F=Forward B=Baclcword 
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The forward primer (SEQ ID NO: 59) contains the T7 promoter consensus sequence 
"GGATCCTAATACGA CTCACTATAGGG AGG" fused to the primer 7581-97-F sequence 
(SEQ ID NO: 47). The Reverse Primer (SEQ ID NO: 60) contains the sequence of primer 7581- 
97M (SEQ ID NO: 48) followed by the 20 bases of upstream sequence with a 18 base pair 
5 deletion between the two blocks of the CYPS2A5 sequence. The forward primer was used with 
the corresponding reverse primer to synthesize the competitor DNA template. The primer pairs 
were combined in a standard Tag Gold polymerase PCR reaction according to the manufacturer's 
recommended conditions (Perkin-Elmer/Applied Biosystems, Foster City, CA). The PCR 
reaction mix contained a final concentration of 250 nM each primer and 10 ng C. tropicalis 

10 chromosomal DNA for template. The reaction mixture was placed in a 

thermocycler for 25 to 35 cycles using the highest annealing temperature possible during the 
PCR reactions to assure a homogeneous PCR product (in this case 62°C), The PCR products 
were either gel purified or filtered purified to remove un-incorporated nucleotides and primers. 
The competitor template DNA was then quantified using the (A 2 6q/2so) method. Primers used in 

15 QC-RT-PCR experiments for the synthesis of various competitive DNA templates are listed in 
Table 7 (SEQ IDNOS: 61-80). 

C. Synthesis of the Competitor RNA 

Competitor template DNA was transcribed In-Vitro to make the competitor RNA 

20 using the Megascript T7 kit from Ambion Biosciences (Ambion Inc., Austin, Texas). 250 
nanograms (ng) of competitor DNA template and the in-vitro transcription reagents are mixed 
according to the directions provided by the manufacturer. The reaction mixture was incubated 
for 4 hours at 37°C. The resulting RNA preparations were then checked by gel electrophoresis 
for the conditions giving the highest yields and quality of competitor RNA. This often required 

25 optimization according to the manufacturer's specifications. The DNA template was then 
removed using DNase I as described in the Ambion kit. The RNA competitor was then 
quantified by the (Aj^go) method. Serial dilution's of the RNA (1 ng/jil to 1 femtogram (fg)/^il) 
were made for use in the QC-RT-PCR reactions and the original stocks stored at -70°C. 

30 
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D. QC-RT-PCR Reactions 

QC-RT-PCR reactions were performed using rTth polymerase from Perkin- 
Elmer(Perkin-Elmer/Applied Biosystems, Foster City, CA) according to the manufacturer's 
recommended conditions. The reverse transcription reaction was performed in a 10 volume 
5 with a final concentrations of 200 ^iM for each dNTP, 1 .25 units rTth polymerase, 1 .0 mM 
MnCl„ IX of the 1 OX buffer supplied with the Enzyme from the manufacturer, 
100 ng of total RNA isolated from a fermentor grown culture of C. tropicalis and 1.25 yM of the 
appropriate reverse primer. To quantitate CYP52A5 expression in C tropicalis an appropriate 
reverse primer was 758 1 -97M (SEQ ID NO: 48). Several reaction mixes were prepared for each 

10 RNA sample characterized. To quantitate CYP52A5 expression a series of 8 to 12 of the 

previously described QC-RT-PCR reaction mixes were aliquoted to different reaction tubes. To 
each tube 1 |xl of a serial dilution containing from 100 pg to 100 fg CYP52AS competitor RNA 
per jlxI was added bringing the final reaction mixtures up to the final volume of 10 jil. The QC- 
RT-PCR reaction mixtures were mixed and incubated at 70°C for 15 min according to the 

15 manufacturer's recommended times for reverse transcription to occur. At the completion of the 
15 minute incubation, the sample temperature was reduced to 4°C to stop the reaction and 40 jil 
of the PCR reaction mix added to the reaction to bring the total volume up to 50 jil. The PCR 
reaction mix consists of an aqueous solution containing 0.3 125 \iM of the forward primer 7581- 
97F (SEQ ID NO: 47), 3. 125 mM MgCl 2 and IX chelating buffer supplied with the enzyme from 

20 Perkin-Elmer. The reaction mixtures were placed in a thermocycier (Perkin-Elmer GeneAmp 
PCR System 2400, Perkin-Elmer/Applied Biosystems, Foster City, CA ) and the following PCR 
cycle performed: 94°C for 1 min. followed by 94°C for 10 seconds followed by 58°C for 40 
seconds for 17 to 22 cycles. The PCR reaction was completed with a final incubation at 58°C for 
2 min followed by 4°C. In some reactions where no detectable PCR products were produced the 

25 samples were returned the thermocycier for additional cycles, this process was repeated until 
enough PCR products were produced to quantify using HPLC. The number of cycles necessary 
to produce enough PCR product is a function of the amount of the target mRNA in the 1 00 ng of 
total cellular RNA. In cultures where the CYP52A5 gene is highly expressed there is sufficient 
CYP52A5 mRNA message present and less PCR cycles (<17) are required to produce 

30 quantifiable amount of PCR product. The lower the concentrations of the target mRNA present 
the more PCR cycles are required to produce a detectable amount of product. These QC-RT- 
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PCR procedures were applied to all the target genes listed in Table 5 using the respective primers 
indicated therein. 



E. HPLC Quantification 
5 Upon completion of the QC-RT-PCR reactions the samples were analyzed and 

quantitated by HPLC. Five to fifteen microliters of the QC-RT-PCR reaction mix was injected 
into a Waters Bio-Compatible 625 HPLC with an attached Waters 484 tunable detector. The 
detector was set to measure a wave length of 254 nm. The HPLC contained a Sarasep brand 
DNASep™ column (Sarasep, Inc., San Jose, CA) which was placed within the oven and the 

10 temperature set for 52 °C. The column was installed according to the manufacturer's 

recommendation of having 30 cm. of heated PEEK tubing installed between the injector and the 
column. The system was configured with a Sarasep brand Guard column positioned before the 
injector. In addition, there was a 0.22 Jim filter disk just before the column, within the oven. 
Two Buffers were used to create an elution gradient to resolve and quantitate the PCR products 

15 from the QC-RT-PCR reactions. Buffer- A consists of 0.1 M tri-ethyl ammonium acetate 

(TEAA) and 5% acetonitrile (volume to volume). Buffer-B consists of 0.1 M TEAA and 25% 
acetonitrile (volume to volume). The QC-RT-PCR samples were injected into the HPLC and the 
linear gradient of 75% buffer- A/ 25% bufifer-B to 45% buffer-A/ 55% B was run over 6 min at a 
flow rate of 0.85 ml per minute. The QC-RT-PCR product of the competitor RNA being 18 

20 base pairs smaller is eluted from the HPLC column before the QC-RT-PCR product from the 
CYPS2AS mRNA(U). The amount of the QC-RT-PCR products are plotted and quantitated with 
an attached Waters Corporation 745 data module. The log ratios of the amount of CYP52A5 
mRNA QC-RT-PCR product (U) to competitor QC-RT-PCR product (C), as measured by peak 
areas, was plotted and the amount of competitor RNA required to equal the amount of CYP52A5 

25 mRNA product determined. In the case of each of the target genes listed in Table 5, the 

competitor RNA contained fewer base pairs as compared to the native target mRNA and eluted 
before the native mRNA in a manner similar to that demonstrated by CYP52A5. HPLC 
quantification of the genes was conducted as above. 

30 
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EXAMPLE 12 
Evaluation of New Strains in Shake Flasks 
The CYP and CPR amplified strains such as strains HDC10, HDC15, HDC20 and 
HDC23 (Table 1) and H5343 were evaluated for diacid production in shake flasks. A single 
5 colony for each strain was transferred from a YPD agar plate into 5 ml of YPD broth and grown 
overnight at 30°C, 250 rpm. An inoculum was then transferred into 50 ml of DCA2 medium 
(Chart) and grown for 24 h at 30°C, 300 rpm. The cells were centrifuged at 5000 rpm for 5 min 
and resuspended in 50 ml of DCA3 medium (Chart) and grown for 24 h at 30°C, 300 rpm. 3% 
oleic acid w/v was added after 24 h growth in DCA3 medium and the cultures were allowed to 
10 bioconvert oleic acid for 48 h. Samples were harvested and the diacid and monoacid 

concentrations were analyzed as per the scheme given in Figure 35. Each strain was tested in 
duplicate and the results shown in Table 8 represent the average value from two flasks. 



Table 8. Byconversion of oleic acid by different recombinant strains of Candida tropicalis 

15 



Strain 


Conversion to 
Oleic diacid 
(%) 


Specific Conversion 
(g diacid/g biomass 


H5343 


41.9 


0.53 


HDC 10-2 


50.5 


0.85 


HDC 15 


54.4 


0.85 


HDC 20-1 


45.1 


0.72 


HDC 20-2 


45.3 


0.58 


HDC 23-2 


55.2 


0.84 


HDC 23-3 


58.8 


0.89 



25 EXAMPLE 13 

Cloning and Characterization of G tropicalis 20336 Cytochrome P450 
Monooxygenase (CYP) and Cytochrome P450 NADPH Oxidoreductase (CPR) Genes 

To clone CYP and CPR genes several different strategies were employed. 
30 Available CYP amino acid sequences were aligned and regions of similarity were observed 

(Figure 4). These regions corresponded to described conserved regions seen in other cytochrome 
P450 families (Goeptar et al., supra and Kalb et al. supra). Proteins from eight eukaryotic 
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cytochrome P450 families share a segmented region of sequence similarity. One region 
corresponded to the HR2 domain containing the invariant cysteine residue near the carboxyl 
terminus which is required for heme binding while the other region corresponded to the central 
region of the I helix thought to be involved in substrate recognition (Figure 4). Degenerate 
5 oligonucleotide primers corresponding to these highly conserved regions of the CYP52 gene 
family present in Candida maltosa and Candida tropicalis ATCC 750 were designed and used 
to amplify DNA fragments of CYP genes from C tropicalis 20336 genomic DNA. These 
discrete PCR fragments were then used as probes to isolate full-length CYP genes from the C. 
tropicalis 20336 genomic libraries. In a few instances oligonucleotide primers corresponding to 
10 highly conserved regions were directly used as probes to isolate full-length CYP genes from 
genomic libraries. In the case of CPR a heterologous probe based upon the known DNA 
sequence for the CPR gene from C tropicalis 750 was used to isolate the C. tropicalis 20336 
CPR gene. 



15 A. Cloning of the CPR Gene from C tropicalis 20336 

1) Cloning of the CPRA Allele 

Approximately 25,000 phage particles from the first genomic library of C. 
tropicalis 20336 were screened with a 1 .9 kb BamYH-Ndel fragment from plasmid pCU3RED 
(See Picattagio et al, Bio/Technology 10:894-898 (1992), incorporated herein by reference) 

20 containing most of the C tropicalis 750 CPR gene. Five clones that hybridized to the probe 
were isolated and the plasmid DNA from these lambda clones was rescued and characterized by 
restriction enzyme analysis. The restriction enzyme analysis suggested that all five clones were 
identical but it was not clear that a complete CPR gene was present. 

PCR analysis was used to determine if a complete CPR gene was present in any of 

25 the five clones. Degenerate primers were prepared for highly conserved regions of known CPR 
genes (See Sutter et al., J. Biol Chem. 265:16428-16436 (1990), incorporated herein by 
reference) ( Figure 4). Two Primers were synthesized for the FMN binding region (FMN1, SEQ 
ID NO: 16 and FMN2, SEQ ID NO: 1 7). One primer was synthesized for the FAD binding 
region (FAD, SEQ ID NO: 1 8), and one primer for the NADPH binding region (NADPH, SEQ 

30 ID NO: 19) (Table 4). These four primers were used in PCR amplification experiments using as 
a template plasmid DNA isolated from four of the five clones described above. The FMN (SEQ 



51 



WO 00/20566 PCT/US99/20797 

ID NOS; 16 and 17) and FAD (SEQ ID NO; 18) primes served as forward primers and the 
NADPH primer (SEQ ID NO: i 9) as the reverse primer in the PCR reactions. When different 
combinations of forward and reverse primers were used, no PCR products were obtained from 
any of the plasrnids. However, all primer combinations amplified expected size products with a 
5 plasmid containing the C. tropicalis 750 CPR gene (positive control). The most likely reason for 
the failure of the primer pairs to amplify a product, was that all four of clones contained a 
truncated Cfttgene. One of the four clones (pHKMl) was sequenced using the Triplex 5' 
(SEQ ID NO: 30) and the Triplex 3* (SEQ ID NO: 31) primers (Table 4) which flank the insert 
and the multiple cloning site on the cloning vector, and with the degenerate primer based upon 

10 the NADPH binding site described above. The NADPH primer (SEQ ID NO: 1 9) failed to yield 
any sequence data and this is consistent with the PCR analysis. Sequences obtained with Triplex 
primers were compared with C tropicalis 750 CPR sequence using the MacVector™ program 
(Oxford Molecular Group, Campbell, CA). Sequence obtained with the Triplex 3 ! primer (SEQ 
ID NO: 3 1) showed similarity to an internal sequence of the C tropicalis 750 CPR gene 

15 confirming that pHKMl contained a truncated version of a 20336 CPR gene. pHKMl had a 3.8 
kb insert which included a L2 kb coding region of the CPR gene accompanied by 2.5 kb of 
upstream DNA (Figure 5). Approximately 0.85 kb of the 20336 CPR gene encoding the C- 
terminal portion of the CPR protein is missing from this clone. 

Since the first Clontech library yielded only a truncated CPR gene, the second 

20 library prepared by Clontech was screened to isolate a full-length CPR gene. Three putative 
CPR clones were obtained. The three clones, having inserts in the range of 5-7 kb, were 
designated pHKM2, pHKM3 and pHKM4. All three were characterized by PCR using the 
degenerate primers described above. Both pHKM2 and pHKM4 gave PCR products with two 
sets of internal primers. pHKM3 gave a PCR product only with the FAD (SEQ ID NO: 1 8) and 

25 NADPH (SEQ ID NO: 1 9) primers suggesting that this clone likely contained a truncated CPR 
gene. All three plasrnids were partially sequenced using the two Triplex primers and a third 
primer whose sequence was selected from the DNA sequence near the truncated end of the CPR 
gene present in pHKMl. This analysis confirmed that both pHKM2 & 4 have sequences that 
overlap pHKMl and that both contained the 3' region of CPR gene that is missing from 

30 pHKMl . Portions of inserts from pHKMl and pHKM4 were sequenced and a full-length CPR 
gene was identified. Based on the DNA sequence and PCR analysis, it was concluded that 
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pHKMl contained the putative promoter region and 1 .2 kb of sequence encoding a portion (5* 
end) of a CPR gene. pHKM4 had 1.1 kb of DNA that overlapped pHKMl and contained the 
remainder (3* end) of a CPR gene along with a downstream untranslated region (Figure 6). 
Together these two plasmids contained a complete CPRA gene with an upstream promoter 
5 region. CPRA is 4206 nucleotides in length (SEQ ID NO: 81) and includes a regulatoiy region 
and a protein coding region (defined by nucleotides 1006-3042) which is 2037 base pairs in 
length and codes for a putative protein of 679 amino acids (SEQ ID NO: 83) (Figures 13 and 
14). In Figure 13, the asterisks denote conserved nucleotides between CPRA and CPRB, bold 
denotes protein coding nucleotides, and the start and stop codons are underlined. The CPRA 
10 protein, when analyzed by the protein alignment program of the Gene Works™ software package 
(Oxford Molecular Group, Campbell, CA), showed extensive homology to CPR proteins from C 
tropicalis 750 and C. maltosa. 



2) Cloning of the CPRB Allele 

15 To clone the second CPRB allele, the third genomic library, prepared by Henkel, 

was screened using DNA fragments from pHKMl and pHKM4 as probes. Five clones were 
obtained and these were sequenced with the three internal primers used to sequence CPRA. 
These primers were designated PRK1.F3 (SEQ ID NO: 20) , PRK1.F5 (SEQ ID NO: 21) and 
PRK4.R20 (SEQ ID NO: 22) (Table 4). and the two outside primers (Ml 3 -20 and T3 

20 [Stratagene]) for the polylinker region present in the pBK-CMV cloning vector. Sequence 
analysis suggested that four of these clones, designated pHKM5 to 8, contained inserts which 
were identical to the CPRA allele isolated earlier. All four seemed to contain a full length CPR 
gene. The fifth clone was very similar to the CPRA allele, especially in the open reading frame 
region where the identity was very high. However, there were significant differences in the 5* 

25 and 3* untranslated regions. This suggested that the fifth clone was the allele to CPRA. The 
plasmid was designated pHKM9 (Figure 7) and a 4.14 kb region of this plasmid was sequenced 
and the analysis of this sequence confirmed the presence of the CPRB allele (SEQ ID NO: 82), 
which includes a regulatory region and a protein coding region (defined by nucleotides 1033- 
3069) (Figure 13). The amino acid sequence of the CPRB protein is set forth in SEQ ID NO: 84 

30 (Figure 14). 
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B. Cloning of C. tropicalis 20336 (CYP) Genes 
1) Cloning of CYP52A2A, CYP52A3A & 3B and CYP52A5A & SB 
Clones carrying CYP52A2A t A3A t A3B,A5AwdA5B genes were 
isolated from the first and second Clontech genomic libraries using an oligonucleotide probe 
5 (HemeBl, SEQ ID NO: 27) whose sequence was based upon the amino acid sequence for the 
highly conserved heme binding region present throughout the CYP52 family. The first and 
second libraries were converted to the plasmid form and screened by colony hybridizations 
using the HemeB 1 probe (SEQ ID NO: 27) (Table 4). Several potential clones were isolated and 
the plasmid DNA was isolated from these clones and sequenced using the HemeBl 

10 oligonucleotide (SEQ ID NO: 27) as a primer. This approach succeeded in identifying five 

CYP52 genes. Three of the CYP genes appeared unique, while the remaining two were classified 
as alleles. Based upon an arbitrary choice of homology to CYP52 genes from Candida maltosa, 
these five genes and corresponding plasmids were designated CYP52A2A (pPA15 [Figure 26]), 
CYP52A3A (pPA57 [Figure 29]), CYP52A3B (pPA62 [Figure 30]), CYP52A5A (pPAL3 [Figure 

15 31]) and CYP52A5B (pPA5 [Figure 32]). The complete DNA sequence including regulatory and 
protein coding regions of these five genes was obtained and confirmed that all five were CYP 52 
genes (Figure 1 5). In Figure 15, the asterisks denote conserved nucleotides among the CYP 
genes. Bold indicates the protein coding nucleotides of the CYP genes, and the start and stop 
codons are underlined. The CYP52A2A gene as represented by SEQ ID NO: 86 has a protein 

20 coding region defined by nucleotides 1 1 99-2767 and the encoded protein has an amino acid 
sequence as set forth in SEQ ID NO: 96. The CYP52A3A gene as represented by SEQ ID NO: 
88 has a protein encoding region defmedby nucleotides 1 126-2748 and the encoded protein has 
an amino acid sequence as set forth in SEQ ID NO: 98. The CYP52A3B gene as represented by 
SEQ ID NO: 89 has a protein coding defined by nucleotides 913-2535 and the encoded protein 

25 has an amino acid sequence as set forth in SEQ ID NO: 99. The CYP52ASA gene as represented 
by SEQ ID NO: 90 has a protein coding region defined by nucleotides 1 103-2656 and the 
encoded protein has an amino acid sequence as set forth in SEQ ID NO: 100. The CYP52A5B 
gene as represented by SEQ ID NO: 91 has a protein coding region defined by nucleotides 1 142- 
2695 and the encoded protein has an amino acid sequence as set forth in SEQ ID NO: 101. 

30 
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2) CI ning of CYPS2A1A and CYPS2A8A 

CYP52A1A and CYP52A8A genes were isolated from the third genomic library 
using PGR fragments as probes. The PCR fragment probe for CYP52A1 was generated after 
PGR amplification of 20336 genomic DNA with oligonucleotide primers that were designed to 
5 amplify a region from the Helix I region to the HR2 region using all available CYP52 genes from 
National Center for Biotechnology Information. Degenerate forward primers UCupl (SEQ ID 
NO: 23) and UCup2 (SEQ ID NO: 24) were designed based upon an amino acid sequence (- 
RDTTAG-) from the Helix I region (Table 4). Degenerate primers UCdownl (SEQ ID NO: 25) 
and UCdown2 (SEQ ID NO: 26) were designed based upon an amino acid sequence (-GQQFAL- 

10 ) from the HR2 region (Table 4). For the reverse primers, the DNA sequence represents the 
reverse complement of the corresponding amino acid sequence. These primers were used in 
pairwise combinations in a PCR reaction with StofFel Tag DNA polymerase (Perkin-Elmer 
Cetus, Foster City, CA) according to the manufacturer's recommended procedure. A PCR 
product of approximately 450 bp was obtained. This product was purified from agarose gel 

15 using Gene-clean™ (Bio 101, LaJolla, CA) and ligated to the pTAG™ vector (Figure 17) (R&D 
systems, Minneapolis, MN) according to the recommendations of the manufacturer. No 
treatment was necessary to clone into pTAG because it employs the use of the TA cloning 
technique. Plasmids from several transformants were isolated and their inserts were 
characterized. One plasmid contained the PCR clone intact The DNA sequence of the PCR 

20 fragment (designated 44CJP3, SEQ ID NO: 107) shared homology with the DNA sequences for 
the CYPS2A1 gene of C maltosa and the CYP52A3 gene of C. tropicalis 750. This fragment 
was used as a probe in isolating the C tropicalis 20336 CYPS2A1 homolog. The third genomic 
library was screened using the 4407*3 PCR probe (SEQ ID NO: 107) and a clone (pHKMl 1) 
that contained a full-length CYP52 gene was obtained (Figure 8). The clone contained a gene 

25 having regulatory and protein coding regions. An open reading frame of 1 572 nucleotides 
encoded a CYP52 protein of 523 amino acids (Figures 15 and 16 ). This CYP52 gene was 
designated CYP52A1A (SEQ ID NO: 85) since its putative amino acid sequence (SEQ ID NO: 
95) was most similar to the CYP52AI protein of C maltosa. The protein coding region of the 
CYP52AIA gene is defined by nucleotides 1 177-2748 of SEQ ID NO: 85. 

30 A similar approach was taken to clone CYP52A8A. A PCR fragment probe for 

CYPS2A8 was generated using primers for highly conserved sequences of CYP52A3, CYP52A2 
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and CYP52A5 genes of C. tropicalis 750. The reverse primer (primer 2,3,5,M) (SEQ ID NO: 29) 
was designed based on the highly conserved heme binding region (Table 4). The design of the 
forward primer (primer 2,3,5,P) (SEQ ID NO: 28) was based upon a sequence conserved near 

the N-terminus of the CYP52A3,CYP52A2 and CYP52A5 genes from C tropicalis 750 (Table 
5 4). Amplification of 20336 genomic DNA with these two primers gave a mixed PCR product 
One amplified PCR fragment was 1006 bp long (designated DCA1002). The DNA sequence for 
this fragment was determined and was found to have 85% identity to the DNA sequence for the 
CYP52D4 gene of C tropicalis 750. When this PCR product was used to screen the third 
genomic library one clone (pHKM12) was identified that contained a full-length CYP52 gene 

10 along with 5' and 3' flanking sequences (Figure 9). The CYP52 gene included regulatory and 
protein coding regions with an open reading frame of 1539 nucleotides long which encoded a 
putative CYP52 protein of 512 amino acids (Figures 15 and 16 ). This gene was designated as 
CYP52A8A (SEQ ID NO: 92) since its amino acid sequence (SEQ ID NO: 102) was most 
similar to the CYP52A8 protein of C maltosa. The protein coding region of the CYP52A8A gene 

15 is defined by nucleotides 464*2002 of SEQ ID NO: 92. The amino acid sequence of the 
CYP52A8A protein is set forth in SEQ ID NO: 102. 

3) Cloning of CYP52D4A 

The screening of the second genomic library with the HemeB 1 (SEQ ID NO: 27) 

20 primer (Table 4) yielded a clone carrying a plasmid (pPAl 8) that contained a truncated gene 
having homology with the CYP52D4 gene of C. maltosa (Figure 33). A 1 .3 to 1 .5-kb EcoRI- 
Sstl fragment from pPAl 8 containing part of the truncated CYP gene was isolated and used as a 
probe to screen the third genomic library for a full length CYP 52 gene. One clone (pHKM13) 
was isolated and found to contain a full-length CYP gene with extensive 5* and 3' flanking 

25 sequences (Figure 10). This gene has been designated as CYP52D4A (SEQ ID NO: 94) and the 
complete DNA including regulatory and protein coding regions (coding region defined by 
nucleotides 767-2266) and putative amino acid sequence (SEQ ID NO: 104) of this gene is 
shown in Figures 15 and 16. CYP52D4A (SEQ ID NO: 94) shares the greatest homology with 
the CYP52D4 gene of C. maltosa. 

30 
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4) Cloning of CYP52A2B and CYP52A8B 

A mixed probe containing CYP52A1A, A2A> A3A f D4A f ASA and A8A genes was 
used to screen the third genomic library and several putative positive clones were identified. 
Seven of these were sequenced with the degenerate primers Cyp52a (SEQ ID NO: 32), Cyp52b 
5 (SEQ ID NO: 33), Cyp52c (SEQ ID NO: 34) and Cyp52d (SEQ ID NO: 35) shown in Table 4. 
These primers were designed from highly conserved regions of the four CYP52 subfamilies, 
namely CYP52A, B, C & D. Sequences from two clones, pHKM14 and pHKM15 (Figures 1 1 and 
12), shared considerable homology with DNA sequence of the C. tropicalis 20336 CYP52A2 
and CYP52A8 genes, respectively. The complete DNA (SEQ ID NO: 87) including regulatory 

10 and protein coding regions (coding region defined by nucleotides 1072-2640) and putative amino 
acid sequence (SEQ ID NO: 97) of the CYP52 gene present in pHKM14 suggested that it is 
CYP52A2B (Figures 15 and 16). The complete DNA (SEQ ID NO: 93) including regulatory and 
protein coding regions (coding region defined by nucleotides 1017-2555) and putative amino 
acid sequence (SEQ ID NO: 103) of the CYP52 gene present in pHKM15 suggested that it is 

15 CYP52A8B (Figures 15 and 16). 

EXAMPLE 14 

Identification oiCYP and CPU Genes Induced by 
Selected Fatty Acid and Alkane Substrates 

20 

Genes whose transcription is turned on by the presence of selected fatty 
acid or alkane substrates have been identified using the QC-RT-PCR assay. This assay was used 
to measure (CYP) and (jCPR) gene expression in fermentor grown cultures C tropicalis ATCC 
20962. This method involves the isolation of total cellular RNA from cultures of C. tropicalis 

25 and the quantification of a specific mRNA within that sample through the design and use of 
sequence specific QC-RT-PCR primers and an RNA competitor. Quantification is achieved 
through the use of known concentrations of highly homologous competitor RNA in the QC-RT- 
PCR reactions. The resulting QC-RT-PCR amplified cDNA's are separated and quantitated 
through the use of ion pairing reverse phase HPLC This assay was used to characterize the 

30 expression of CYP52 genes of C tropicalis ATCC 20962 in response to various fatty acid and 
alkane substrates. Genes which were induced were identified by the calculation of their mRNA 
concentration at various times before and after induction. Figure 1 8 provides an example of 
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how the concentration of mRNA for CYP52A5 can be calculated using the QC-RT-PCR assay. 
The log ratio of unknown (U) to competitor product (C) is plotted versus the concentration of 
competitor RNA present in the QC-RT-PCR reactions. The concentration of competitor which 
results in a log ratio of U/C of zero, represents the point where the unknown messenger RNA 
concentration is equal to the concentration of the competitor. Figure 18 allows for the 
calculation of the amount of Cr?5Z45 message present in lOOngof total RNA isolated from 
cell samples taken at 0, 1, and 2 hours after the addition of Emersol® 267 in a fermentor run. 
From this analysis, it is possible to determine the concentration of the CYP52A5 mRNA present 
in 100 ng of total cellular RNA. In the plot contained in Figure 18 it takes 0.46 pg of competitor 
to equal the number of mRNA's of CYP52A5 in 100 ng of RNA isolated from cells just prior 
(time 0) to the addition of the substrate, Emersol® 267. In cell samples taken at one and two 
hours after the addition of Emersol® 267 it takes 5.5 and 8.5 pg of competitor RNA, 
respectively. This result demonstrates that CYP52A5 (SEQ ID NOS: 90 and 91) is induced more 
than 18 fold within two hours after the addition of Emersol® 267. This type of analysis was 
used to demonstrate that CYP52A5 (SEQ ID NO: 90 and 91) is induced by Emersol® 267. 
Figure 19 shows the relative amounts of CYP52A5 (SEQ ID NOS: 90 and 91) expression in 
fermentor runs with and without Emersol® 267 as a substrate. The differences in the CYP52A5 
(SEQ. ID NOS: 90 and 91) expression patterns are due to the addition of Emersol® 267 to the 

fermentation medium. 

This analysis clearly demonstrates that expression of CYP52A5 (SEQ ID NOS: 90 
and 91) in C. tropicalis 20962 is inducible by the addition of Emersol® 267 to the growth 
medium. This analysis was performed to characteri2e the expression of CYP52A2A (SEQ ID 
NO: 86) . CYP52A3AB (SEQ ID NOS: 88 and 89) , CYP52A8A (SEQ ID NO: 92) . CYP52A1A 
(SEQ ID NO: 85), CYP52D4A (SEQ ID NO: 94) and CPRB (SEQ ID NO: 82) in response to the 
presence ofEmersol® 267 in the fermentation medium (Figure 20). The results of these 
analysis' indicate, that like the CYP52A5 gene (SEQ ID NOS: 90 and 91) of C. tropicalis 20962, 
the CYP52A2A gene (SEQ ID NO: 86) is inducible by Emersol® 267. A small induction is 
observed for CYP52A1A (SEQ ID NO: 85) and CYP52A8A (SEQ ID NO: 92). In contrast, any 
induction for CYP52D4A (SEQ ID NO: 94). CYP52A3A (SEQ ID NO: 88), CYP52A3B (SEQ ID 
NO: 89) is below the level of detection of the assay. CPRB (SEQ ID NO: 82) is moderately 
induced by Emersol® 267, four to five fold. The results of these analysis are summarized in 
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Figure 20. Figure 34 provides an example of selective induction of CYPS2A genes. When pure 
fatty acid or alkanes are spiked into a feimentor containing C tropicalis 20962 or a derivative 
thereof, the transcriptional activation of CYP52A genes was detected using the QC-RT-PCR 
assay. Figure 34 shows that pure oleic acid (CI 8:1) strongly induces CYP52A2A (SEQ ID NO: 
5 86) while inducing CYP52A5 (SEQ ID NOS: 90 and 91). In the same feimentor addition of pure 
alkane (tridecane) shows strong induction of both CYP52A2A (SEQ ID NO: 86) and CYP52A1A 
(SEQ ID NO: 85). However, tridecane did not induce CYP52A5 (SEQ ID NOS: 90 and 91) . In 
a separate fermentation using ATCC 20962, containing pure octadecane as the substrate, 
induction of CYP52A2A, CYP52ASA and CYPS2A1A is detected (see Figure 36). The foregoing 
10 demonstrates selective induction of particular CYP genes by specific substrates, thus providing 
techniques for selective metabolic engineering of cell strains. For example, if tridecane 
modification is desired, organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) 
and CYPS2A1A (SEQ ID NO: 85) activity are indicated. If oleic acid modification is desired, 
organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) activity are indicated. 

15 

EXAMPLE IS 

Integration of Selected CFPand CPR Genes 
into the Genome of Candida tropicalis 

20 In order to integrate selected genes into the chromosome of C tropicalis 20336 or 

its descendants, there has to be a target DNA sequence, which may or may not be an intact gene, 
into which the genes can be inserted. There must also be a method to select for the integration 
event In some cases the target DNA sequence and the selectable marker are the same and, if so, 
then there must also be a method to regain use of the target gene as a selectable marker following 

25 the integration event. In C tropicalis and its descendants, one gene which fits these criteria is 
URA3A, encoding orotidine-5'-phosphate decarboxylase. Using it as a target for integration, urcr 
variants of C. tropicalis can be transformed in such a way as to regenerate a URA* genotype via 
homologous recombination (Figure 21). Depending upon the design of the integration vector, 
one or more genes can be integrated into the genome at the same time. Using a split URA3A 

30 gene oriented as shown in Figure 22, homologous integration would yield at least one copy of the 
gene(s) of interest which are inserted between the split portions of the URA3A gene. Moreover, 
because of the high sequence similarity between URA3A and URA3B genes, integration of the 
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Figure 20. Figure 34 provides an example of selective induction of CYP52A genes. When pure 
fatty acid or alkanes are spiked into a fermentor containing C tropicalis 20962 or a derivative 
thereof, the transcriptional activation of CYP52A genes was detected using the QC-RT-PCR 
assay. Figure 34 shows that pure oleic acid (CI 8:1) strongly induces CYP52A2A (SEQ ID NO: 
5 86) while inducing CYPS2A5 (SEQ ID NOS: 90 and 91). In the same fermentor addition of pure 
alkane (tridecane) shows strong induction of both CYP52A2A (SEQ ID NO: 86) and CYP 32 Al A 
(SEQ ID NO: 85). However, tridecane did not induce CYP52A5 (SEQ ID NOS: 90 and 91) . In 
a separate fermentation using ATCC 20962, containing pure octadecane as the substrate, 
induction of CYP52A2A, CYP52A5A and CYP52A1A is detected (see Figure 36). The foregoing 
10 demonstrates selective induction of particular CYP genes by specific substrates, thus providing 
techniques for selective metabolic engineering of cell strains. For example, if tridecane 
modification is desired, organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) 
and CYP52A1A (SEQ ID NO: 85) activity are indicated. If oleic acid modification is desired, 
organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) activity are indicated. 

15 

EXAMPLE 15 

Integration of Selected CYP and CPR Genes 
into the Genome of Candida tropicalis 

20 In order to integrate selected genes into the chromosome of C tropicalis 20336 or 

its descendants, there has to be a target DN A sequence, which may or may not be an intact gene, 
into which the genes can be inserted. There must also be a method to select for the integration 
event In some cases the target DNA sequence and the selectable marker are the same and, if so, 
then there must also be a method to regain use of the target gene as a selectable marker following 

25 the integration event. In C. tropicalis and its descendants, one gene which fits these criteria is 
URA3A, encoding orotidine-5'-phosphate decarboxylase. Using it as a target for integration, urcr 
variants of C. tropicalis can be transformed in such a way as to regenerate a URA* genotype via 
homologous recombination (Figure 21). Depending upon the design of the integration vector, 
one or more genes can be integrated into the genome at the same time. Using a split URA3A 

30 gene oriented as shown in Figure 22, homologous integration would yield at least one copy of the 
gene(s) of interest which are inserted between the split portions of the URA3A gene. Moreover, 
because of the high sequence similarity between URA3A and URA3B genes, integration of the 
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construct can occur at both the URA3A and URA3B loci. Subsequently, an oligonucleotide 
designed with a deletion in a portion of the URA gene based on the identical sequence across 
both the URA3A and URA3B genes, can be utilized to yield C tropicalis transfonnants which 
are once again urcc but which still carry one or more newly integrated genes of choice (Figure 
5 21), urcT variants of C. tropicalis can also be isolated via other methods such as classical 
mutagenesis or by spontaneous mutation. Using well established protocols, selection of urcf 
strains can be facilitated by the use of 5-fluoroorotic acid (5-FOA) as described, e.g., in Boeke et 
al. y Mol Gen. Genet. 197:345-346, (1984), incorporated herein by reference- The utility of this 
approach for the manipulation of C. tropicalis has been well documented as described, e.g., in 
10 Picataggio et aL, Mol. andCell Biol 1 1:4333-4339 (1991); Rohrer et al., Appl Microbiol 

Biotechnol 36:650-654 (1992); Picataggio et al., Bio/Technology 10:894-898 (1992); U.S. Patent 
No. 5,648,247; U.S. Patent No. 5,620,878; U.S. Patent No. 5,204,252; U.S. Patent No. 
5,254,466, all of which are incorporated herein by reference. 



15 A Construction of a URA Integration Vector, pURAin* 

Primers were designed and synthesized based on the 1712 bp sequence of the 
URA3A gene of C. tropicalis 20336 (see Figure 23). The nucleotide sequence of the URA3A 
gene of C. tropicalis 20336 is set forth in SEQ ID NO: 105 and the amino acid sequence of the 
encoded protein is set forth in SEQ ID NO: 106. URA3A Primer Set #la (SEQ ID NO: 9) and 

20 #lb (SEQ ID NO: 10) (Table 4) was used in PCR with C. tropicalis 20336 genomic DNA to 
amplify URA3A sequences between nucleotide 733 and 1688 as shown in Figure 23. The 
primers are designed to introduce unique 5' Ascl and 3 r Pad restriction sites into the resulting 
amplified URA3A fragment Ascl and Pad sites were chosen because these sites are not present 
within CYP or CPR genes identified to date. URA3A Primer Set #2 was used in PCR with C. 

25 tropicalis 20336 genomic DNA as a template, to amplify URA3A sequences between nucleotide 
9 and 758 as shown in Figure 23. URA3A Primer set #2a (SEQ ID NO: 1 1) and #2b (SEQ ID 
NO: 12) (Table 4) was designed to introduce unique 5' Pad and 3' Pmel restriction sites into the 
resulting amplified URA3A fragment The Pmel site is also not present within CYP and CPR 
genes identified to date. PCR fragments of the URA3A gene were purified, restricted with Ascl> 

30 Pad and Pmel restriction enzymes and ligated to a gel purified, QiaexII cleaned AschPmel 
digest of plasmid pNEB 1 93 (Figure 25) purchased from New England Biolabs (Beverly, MA). 
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The ligation was performed with an equimolar number of DNA termini at 16 °C for 16 hr using 
T4 DNA ligase (New England Biolabs)* Ligations were transformed into E. coli XL 1 -Blue cells 
(Stratagene, LaJolla, CA) according to manufacturers recommendations. White colonies were 
isolated, grown, plasmid DNA isolated and digested with Ascl-Pmel to confirm insertion of the 
5 modified URA3A into pNEB193. The resulting base integration vector was named pURAin 
(Figure 24). 

B. Amplification of CYP52A2A, CYP52A3A, CYP52A5A and 
CPRB from G tropicalis 20336 Genomic DNA 

10 The genes encoding CYP52A2A> (SEQ ID NO: 86) and CYP52A3A (SEQ ID NO: 

88) from C tropicalis 20336 were amplified from genomic clones (pPA15 and pPA57, 
respectively) (Figures 26 and 29) via PCR using primers (Primer CYP 2A#1, SEQ ID NO: 1 and 
Primer CYP 2A#2, SEQ ID NO: 2 for CYP52A2A) (Primer CYP 3A#1 , SEQ ID NO: 3 and 
Primer CYP 3A#2, SEQ ID NO: 4 for CYP52A3A) to introduce Pad cloning sites. These PCR 

15 primers were designed based upon the DNA sequence determined for CYP52A2A (SEQ ID NO; 
86) (Figure 15). The AmpliTaq Gold PCR kit (Perkin Elmer Cetus, Foster City, CA) was used 
according to manufacturers specifications. The CYP 52 A2 A PCR amplification product was 2,230 
base pairs in length , yielding 496 bp of DNA upstream of the CYPS2A2A start codon and 168 bp 
downstream of the stop codon for the CYP52A2A ORF. The CYP52A3A PCR amplification 

20 product was 2 1 54 base pairs in length, yielding 437bp of DNA upstream of the CYP52A3A start 
codon and 97bp downstream of the stop codon for the CYP 52 A3 A ORF. The CYP52A3A PCR 
amplification product was 2154 base pairs in length, yielding 437bp of DNA upstream of the 
CYP52A3A start codon and 97bp downsteam of the stop codon for the CYP52A3A ORF. 

The gene encoding CYP52A5A (SEQ ID NO: 90) from C tropicalis 20336 was 

25 amplified from genomic DNA via PCR using primers (Primer CYP 5 A# 1 , SEQ ID NO: 5 and 
Primer CYP 5A#2, SEQ ID NO: 6) to introduce Pad cloning sites. These PCR primers were 
designed based upon the DNA sequence determined for CYP52A5A (SEQ ID NO: 90) . The 
Expand Hi-Fi Taq PCR kit (Boehringer Mannheim, Indianapolis, IN) was used according to 
manufacturers specifications. The CYP52A5A PCR amplification product was 3,298 base pairs 

30 in length. 
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The gene encoding CPRB (SEQ ID NO: 82) from C tropicalis 20336 was 
amplified from genomic DNA via PCR using primers (CPR B#l, SEQ ID NO: 7 and CPR B#2, 
SEQ ID NO: 8) based upon the DNA sequence determined for CPRB (SEQ ID NO: 82) (Figure 
13). These primers were designed to introduce unique Pad cloning sites. The Expand Hi-Fi 
5 Taq PCR kit (Boehringer Mannheim, Indianapolis, IN) was used according to manufacturers 
specifications. The CPRB PCR product was 3266 bp in length, yielding 747 bp pf DNA 
upstream of the CPRB start codon and 493 bp downstream of the stop codon for the CPRB ORF. 
The resulting PCR products were isolated via agarose gel electrophoresis, purified using Qiaexll 
and digested with Pad. The PCR fragments were purified, desalted and concentrated using a 
10 Microcon 1 00 (Amicon, Beverly, MA). 

The above described amplification procedures are applicable to the other genes 
listed in Table 5 using the respectively indicated primers. 

G Cloning of CYP and CPR Genes into pURAin. 

15 The next step was to clone the selected CYP and CPR genes into the pURAin 

integration vector. In a preferred aspect of the present invention, no foreign DNA other than that 
specifically provided by synthetic restriction site sequences are incorporated into the DNA which 
was cloned into the genome of G tropicalis, i.e., with the exception of restriction site DNA only 
native C tropicalis DNA sequences are incorporated into the genome, pURAin was digested 

20 with Pad, Qiaex II cleaned, and dephosphorylated with Shrimp Alkaline Phosphatase (SAP) 
(United States Biochemical, Cleveland, OH) according the manufacturer's recommendations. 
Approximately 500 ng of Pad linearized pURAin was dephosphorylated for 1 hr at 37°C using 
SAP at a concentration of 0.2 Units of enryme per 1 pmol of DNA termini, The reaction was 
stopped by heat inactivation at 65 °C for 20 mia 

25 The CYP52A2A Pad fragment derived using the primer shown in Table 4 was 

ligated to plasmid pURAin which had also been digested with Pad. Pad digested pURAin was 
dephosphorylated, and ligated to the CYP52A2A ULTMA PCR product as described previously. 
The ligation mixture was transformed into £. coli XL1 Blue MRF* (Stratagene) and 2 resistant 
colonies were selected and screened for correct constructs which should contain vector sequence, 

30 the inverted URA3A gene, and the amplified CYPS2A2A gene (SEQ ID NO: 86) of 20336. AscU 
Pmel digestion identified one of the two constructs, plasmid pURA2in, as being correct (Figure 
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27). This plasmid was sequenced and compared to CYP52A2A (SEQ ID NO: 86) to confirm that 
PCR did not introduce DNA base changes that would result in an amino acid change. 

Prior to its use, the CPRB Pad fragment derived using the primers shown in 
Table 4 was sequenced and compared to CPRB (SEQ ID NO: 82) to confirm that PCR did not 
5 introduce DNA base pair changes that would result in an amino acid change. Following 
confirmation, CPRB (SEQ ID NO: 82) was ligated to plasmid pURAin which had also been 
digested with Pad. Pad digested pURAin was dephosphorylated, and ligated to the CPR 
Expand Hi-Fi PCR product as described previously. The ligation mixture was transformed into 
E. coli XL1 Blue MRF* (Stratagene) and several resistant colonies were selected and screened 

10 for correct constructs which should contain vector sequence, the inverted URA3A gene, and the 
amplified CPRB gene (SEQ ID NO: 82) of 20336. Ascl-Pmel digestion confirmed a successful 
construct, pURAREDBin. 

In a manner similar to the above, each of the other CYP and CPR genes disclosed 
herein are cloned into pURAin. Pad fragments of these genes, whose sequences are given in 

15 Figures 1 3 and 1 5, are derivable by methods known to those skilled in the art. 

1) Construction of Vectors Used to Generate HDC 20 and HDC 23 

A previously constructed integration vector containing CPRB (SEQ ID NO: 82), 
pURAREDBin, was chosen as the starting vector. This vector was partially digested with Pad 

20 and the linearized fragment was gel-isolated. The active Pad was destroyed by treatment with 
T4 DNA polymerase and the vector was re-ligated. Subsequent isolation and complete digestion 
of this new plasmid yielded a vector now containing only one active Pad site. This fragment 
was gel-isolated, dephosphorylated and ligated to the CYP52A2A Pad fragment. Vectors that 
contain the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes oriented in the 

25 same direction, pURAin CPR 2A S, as well as opposite directions (5* ends connected), pURAin 
CPR 2A O, were generated. 



D. Confirmation of CYP Integration (Figure 21 for Integration Scheme) 
into the Genome of C. tropicalis 

30 Based on the construct, pURA2in, used to transform H5343 ura\ a scheme to 

detect integration was devised. Genomic DNA from transformants was digested with Dra III 
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and Spe 1 which are enzymes that cut within the URA3A, and URA3B genes but not within the 
integrated CYP52A2A gene. Digestion of genomic DNA where an integration had occurred at 
the URA3A or URA3B loci would be expected to result in a 3.5 kb or a 3.3 kb fragment, 
respectively (Figure 28). Moreover, digestion of the same genomic DNA with Pad would yield 
5 a 2.2 kb fragment characteristic for the integrated CYPS2A2A gene (Figure 28). Southern 
hybridizations of these digests with fragments of the CYP52A2A gene were used to screen for 
these integration events. Intensity of the band signal from the Southern using Pad digestion was 
used as a measure of the number of integration events, ((i.e. the more copies of the CYP52A2A 
gene (SEQ ID NO: 86) which are present, the stronger the hybridization signal)). 

10 C. tropicalis H5343 transformed URA prototrophs were grown at 30°C, 170 rpm, 

in 10 ml SC-uracil media for preparation of genomic DNA. Genomic DNA was isolated by the 
method described previously. Genomic DNA was digested with Spel and DrallL A 0.95% 
agarose gel was used to prepare a Southern hybridization blot. The DNA from the gel was 
transferred to a MagnaCharge nylon filter membrane (MSI Technologies, Westboro, MA) 

15 according to the alkaline transfer method of Sambrook et al., supra. For the Southern 

hybridization, a 2.2 kb CYP52A2A DNA fragment was used as a hybridization probe. 300 ng of 
CYP52A2A DNA was labeled using a ECL Direct labeling and detection system (Amersham) and 
the Southern was processed according to the ECL kit specifications. The blot was processed in a 
volume of 30 ml of hybridization fluid corresponding to 0.125 ml/cm 2 . Following a 

20 prehybridization at 42 °C for 1 hr, 300 ng of CYPS2A2A probe was added and the hybridization 
continued for 16 hr at 42°C. Following hybridization, the blots were washed two times for 20 
min each at 42 °C in primary wash containing urea. Two 5 min secondary washes at RT were 
conducted, followed by detection according to directions. The blots were exposed for 16 hours 
(hr) as recommended. 

25 Integration was confirmed by the detection of a Spel-Dralll 3.5 kb fragment from 

the genomic DNA of the transformants but not with the C tropicalis 20336 control. 
Subsequently, a Pad digestion of the genomic DNA of the positive transformants, followed by a 
Southern hybridization using an CYP52A2A gene probe, confirmed integration by the detection 
of a 2.2 kb fragment. The resulting CYP52A2A integrated strain was named HDC1 (see Table 1). 
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and Spe I which are enzymes that cut within the URA3A, and URA3B genes but not within the 
integrated CYP52A2A gene. Digestion of genomic DNA where an integration had occurred at 
the URA3A or URA3B loci would be expected to result in a 3.5 kb or a 3.3 kb fragment, 
respectively (Figure 28). Moreover, digestion of the same genomic DNA with Pad would yield 
5 a 22 kb fragment characteristic for the integrated CYP52A2A gene (Figure 28). Southern 
hybridizations of these digests with fragments of the CYP52A2A gene were used to screen for 
these integration events. Intensity of the band signal from the Southern using Pad digestion was 
used as a measure of the number of integration events, ((i.e. the more copies of the CYP52A2A 
gene (SEQ ID NO: 86) which are present, the stronger the hybridization signal)). 

10 C tropicalis H5343 transformed URA prototrophs were grown at 30°C, 170 rpm, 

in 10 ml SC-uracil media for preparation of genomic DNA. Genomic DNA was isolated by the 
method described previously. Genomic DNA was digested with Spel and DrallL A 0.95% 
agarose gel was used to prepare a Southern hybridization blot. The DNA from the gel was 
transferred to a MagnaCharge nylon filter membrane (MSI Technologies, Westboro, MA) 

15 according to the alkaline transfer method of Sambrook et al., supra. For the Southern 

hybridization, a 2.2 kb CYPS2A2A DNA fragment was used as a hybridization probe. 300 ng of 
CYP52A2A DNA was labeled using a ECL Direct labeling and detection system (Amersham) and 
the Southern was processed according to the ECL kit specifications. The blot was processed in a 
volume of 30 ml of hybridization fluid corresponding to 0.125 ml/cm 2 . Following a 

20 prehybridization at 42 °C for 1 hr, 300 ng of CYP52A2A probe was added and the hybridization 
continued for 16 hr at 42 °C. Following hybridization, the blots were washed two times for 20 
min each at 42 °C in primary wash containing urea. Two 5 min secondary washes at RT were 
conducted, followed by detection according to directions. The blots were exposed for 16 hours 
(hr) as recommended. 

25 Integration was confirmed by the detection of a Spel-Dralll 3.5 kb fragment from 

the genomic DNA of the transformants but not with the C tropicalis 20336 control. 
Subsequently, a Pad digestion of the genomic DNA of the positive transformants, followed by a 
Southern hybridization using an CYP52A2A gene probe, confirmed integration by the detection 
of a 2.2 kb fragment. The resulting CYPS2A2A integrated strain was named HDCl (see Table 1), 
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In a manner similar to the above, each of the genes contained in the Pad 
fragments which are described in Section 3c above were confirmed for integration into the 
genome of C. tropicalis. 

Transformants generated by transformation with the vectors, pURAin CPR 2A S 
5 or pURAin CPR 2A O, were analyzed by Southern hybridization for integration of both the 
CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes tandemly. Three strains were 
generated in which the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes 
integrated are in the opposite orientation (HDC 20-1, HDC 20-2 and HDC 20-3) and three were 
generated with the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes integrated 
10 in the same orientation (HDC 23-1 , HDC 23-2 and HDC 23-3), Table 1 . 

E. Confirmation of CPRB Integration into H5343 ura 

Seven transformants were screened by colony PCR using CPRB primer #2 (SEQ 
ID NO: 8) and a URA3A- specific primer. In five of the transformants, successful integration 
15 was detected by the presence of a 3899 bp PCR product This 3899 bp PCR product represents 
the CPRB gene adjacent to the URA3A gene in the genome of H5343 thereby confirming 
integration. The resulting CPRB integrated strains were named HDC 10-1 and HDC 10-2 (see 
Table 1). 

20 F. Strain Evaluation. 

As determined by quantitative PCR, when compared to parent H5343, HDC10-1 
contained three additional copies of the reductase gene and HDC 10-2 contained four additional 
copies of the reductase gene. Evaluations of HDC20-1, HDC20-2 and HDC20-3 based on 
Southern hybridization data indicates that HDC20-1 contained multiple integrations, i.e., 2 to 3 

25 times that of HDC20-2 or HDC20-3. Evaluations of HDC23-1, HDC23-2, and HDC23-3 based 
on Southern hybridization data indicates that HDC23-3 contained multiple integrations, i.e., 2 to 
3 times that of HDC23-1 or HDC23-2. The data in Table 8 indicates that the integration of 
components of the to-hydroxylase complex have a positive effect on the improvement of 
Candida tropicalis ATCC 20962 as a biocatalyst. The results indicate that CYP52A5A (SEQ ID 

30 NO: 90) is an important gene for the conversion of oleic acid to diacid. Surprisingly, tandem 
integrations of CYP and CPR genes oriented in the opposite direction (HDC 20 strains) seem to 
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be less productive than tandem integrations oriented in the same direction (HDC 23 strains), 
Tables 1 and 8. 
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10 



DCA2 medium 

Peptone 

Yeast Extract 

Sodium Acetate 

Yeast Nitrogen Base (Difco) 

Glucose (anhydrous) 

Potassium Phosphate (dibasic; trihydraic) 



3.0 
6.0 
3.0 
6.7 
50.0 
7.2 



Potassium Phosphate (monobasic, anhydrous) 93 
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•See Kaiser et at.. Methods in Yeast Genetics, Cold Spring Harbor Laboratory Press, USA (1994), incorporated herein by 
reference. 
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It will be understood that various modifications may be made to the embodiments 
and/or examples disclosed herein. Thus, the above description should not be construed as 
limiting, but merely as exemplifications of preferred embodiments. Those skilled in the art will 
envision other modifications within the scope and spirit of the claims appended hereto. 



68 



WO 00/20566 PCT7US99/20797 

WHAT IS CLAIMED IS: 

1 . Isolated nucleic acid encoding a CPRA protein having the amino 
acid sequence set forth in SEQ ID NO: 83. 

5 

2. Isolated nucleic acid comprising a coding region defined by nucleotides 1006- 
3042 as set forth in SEQ ID NO: 81. 

3. Isolated nucleic acid according to claim 2 comprising the nucleotide sequence 
10 as set forth in SEQ ID NO: 81. 

4. Isolated protein comprising an amino acid sequence as set forth in SEQ ID NO: 

83. 

15 5. A vector comprising a nucleotide sequence encoding CPRA protein including 

an amino acid sequence as set forth in SEQ ID NO: 83. 

6. A vector according to claim 5 wherein the nucleotide sequence is set forth in 
nucleotides 1006-3042 of SEQ ID NO: 81 

20 

7. A vector according to claim 5 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

8. A host cell transfected or transformed with the nucleic acid of claim 1. 

25 

9. A host cell according to claim 8 wherein the host cell is a yeast cell. 

1 0. A host cell according to claim 9 wherein the yeast cell is a Candida sp. 

30 1 1 . A host cell according to claim 1 0 wherein the Candida sp. is Candida 

tropicalis. 
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12. A host cell according to claim 1 1 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

13. A host cell according to claim 12 wherein the Candida tropicalis is H5343 

5 ura-. 

14. A method of producing a CPRA protein including an amino acid sequence as 
set forth in SEQ ID NO: 83 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
10 having the amino acid sequence as set forth in SEQ ID NO: 83; and 

b) culturing the cell under conditions favoring the expression of the protein. 

1 5. The method according to claim 14 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

15 

16. Isolated nucleic acid encoding a CPRB protein having the amino acid 
sequence set forth in SEQ ID NO: 84. 

17. Isolated nucleic acid comprising a coding region defined by nucleotides 1033- 
20 3069 as set forth in SEQ ID NO: 82. 

18. Isolated nucleic acid according to claim 17 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 82. 

25 19. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 84. 

20. A vector comprising a nucleotide sequence encoding CPRB protein including 
an amino acid sequence as set forth in SEQ ID NO: 84. 

30 

21 . A vector according to claim 20 wherein the nucleotide sequence is set forth in 
nucleotides 1033-3069 of SEQ ID NO: 82. 
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22. A vector according to claim 20 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid.. 

23. A host cell transfected or transformed with the nucleic acid of claim 1 6. 

5 

24. A host cell according to claim 23 wherein the host cell is a yeast cell. 

25. A host cell according to claim 24 wherein the yeast cell is a Candida sp. 

10 26. A host cell according to claim 25 wherein the Candida sp. is Candida 

tropicalis. 

27. A host cell according to claim 26 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

15 

28. A host cell according to claim 27 wherein the Candida tropicalis is H5343 

ura-. 

29. A method of producing a CPRB protein including an amino acid sequence as 
20 set forth in SEQ ID NO: 84 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 84; and 

b) culturing the cell under conditions favoring the expression of the protein. 

25 30. The method according to claim 29 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

31. Isolated nucleic acid encoding a CYP52A1A protein having the amino acid 
sequence set forth in SEQ ID NO: 95. 

30 

32. Isolated nucleic acid comprising a coding region defined by nucleotides 1 177- 
2748 as set forth in SEQ ID NO: 85. 
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33* Isolated nucleic acid according to claim 32 comprising the nucleotide sequence 
as set forth in SEQ ID NO: 85. 

34, Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

5 NO: 95. 

35. A vector comprising a nucleotide sequence encoding CYP52A1A protein 
including an amino acid sequence as set forth in SEQ ID NO: 95. 

10 36. A vector according to claim 35 wherein the nucleotide sequence is set forth in 

nucleotides 1 177-2748 of SEQ ID NO: 85. 

37. A vector according to claim 35 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

15 

3 8. A host cell transfected or transformed with the nucleic acid of claim 3 1 . 
39. A host cell according to claim 38 wherein the host cell is a yeast cell. 
20 40. A host cell according to claim 39 wherein the yeast cell is a Candida sp. 

41 . A host cell according to claim 40 wherein the Candida sp. is Candida 

tropicalis. 

25 

42. A host cell according to claim 41 wherein the Candida tropicalis is Candida 
tropicalis 20336, 

43. A host cell according to claim 42 wherein the Candida tropicalis is H5343 

30 ura-. 
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44. A method of producing a CYP52A1A protein including an amino acid 
sequence as set forth in SEQ ID NO: 95 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 95; and 
5 b) culturing the cell under conditions favoring the expression of the protein. 

45. The method according to claim 44 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

10 46. Isolated nucleic acid encoding a CYP52A2A protein having the amino acid 

sequence set forth in SEQ ID NO: 96. 

47. Isolated nucleic acid comprising a coding region defined by nucleotides 1 199- 
2767 as set forth in SEQ ID NO: 86. 

15 

48. Isolated nucleic acid according to claim 47 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 86. 

49. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

20 NO: 96. 

50. A vector comprising a nucleotide sequence encoding CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96. 

25 51. A vector according to claim 50 wherein the nucleotide sequence is set forth in 

nucleotides 1 199-2767 of SEQ ID NO: 86. 

52. A vector according to claim 50 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

30 

53. A host cell transfected or transformed with the nucleic acid of claim 46. 
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A host cell according to claim 53 wherein the host cell is a yeast cell. 



55. A host cell according to claim 54 wherein the yeast cell is a Candida sp. 

5 56. A host cell according to claim 55 wherein the Candida sp. is Candida 

tropicalis. 

57. A host cell according to claim 56 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

10 

58. A host cell according to claim 57 wherein the Candida tropicalis is H5343 

ura-. 

59. A method of producing a CYP52A2A protein including an amino acid 
15 sequence as set forth in SEQ ID NO: 96 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 96; and 

b) culturing the cell under conditions favoring the expression of the protein. 

20 60. The method according to claim 59 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

61 . Isolated nucleic acid encoding a CYP52A2B protein having the amino acid 
sequence set forth in SEQ ID NO: 97. 

25 

62. Isolated nucleic acid comprising a coding region defined by nucleotides 1072- 
2640 as set forth in SEQ ID NO: 87. 

63. Isolated nucleic acid according to claim 62 comprising the nucleotide sequence 
30 as set forth in SEQ ID NO: 87. 



74 



WO 00/20566 

64. 

NO: 97. 



PCT/US99/20797 

Isolated protein comprising an amino acid sequence as set forth in SEQ ID 



65. A vector comprising a nucleotide sequence encoding CYP52A2B protein 
5 including an amino acid sequence as set forth in SEQ ID NO: 97. 

66. A vector according to claim 65 wherein the nucleotide sequence is set forth in 
nucleotides 1072-2640 of SEQ ID NO: 87. 

10 67. A vector according to claim 65 wherein the vector is selected from the group 

consisting of plasmid, phagemid, phage and cosmid. 

68. A host cell transfected or transformed with the nucleic acid of claim 61. 

15 69. A host cell according to claim 68 wherein the host cell is a yeast cell. 

70. A host cell according to claim 69 wherein the yeast cell is a Candida sp. 



71. A host cell according to claim 70 wherein the Candida sp. is Candida 



20 tropicalis. 



72. A host cell according to claim 71 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

25 73. A host cell according to claim 72 wherein the Candida tropicalis is H5343 

ura-. 

74. A method of producing a CYPS2A2B protein including an amino acid 
sequence as set forth in SEQ ID NO: 97 comprising: 
30 a) transforming a suitable host cell with a DNA sequence that encodes the protein 

having the amino acid sequence as set forth in SEQ ID NO: 97; and 

b) culturing the cell under conditions favoring the expression of the protein. 
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75. The method according to claim 74 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

76. Isolated nucleic acid encoding a CYP52A3A protein having the amino acid 
5 sequence set forth in SEQ ID NO: 98. 

77. Isolated nucleic acid comprising a coding region defined by nucleotides 1 126- 
2748 as set forth in SEQ ID NO: 88. 

10 78. Isolated nucleic acid according to claim 77 comprising the nucleotide sequence 

as set forth in SEQ ID NO: 88. 

79. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 98. 

15 

80. A vector comprising a nucleotide sequence encoding CYP52A3A protein 
including an amino acid sequence as set forth in SEQ ID NO: 98. 

81 . A vector according to claim 80 wherein the nucleotide sequence is set forth in 
20 nucleotides 1 1 26-2748 of SEQ ID NO: 88. 

82. A vector according to claim 80 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

25 83. A host cell transfected or transformed with the nucleic acid of claim 76. 

84. A host cell according to claim 83 wherein the host cell is a yeast cell. 

85. A host cell according to claim 84 wherein the yeast cell is a Candida sp. 

30 

86. A host cell according to claim 85 wherein the Candida sp. is Candida 

tropicalis. 
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87. A host cell according to claim 86 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

88. A host cell according to claim 87 wherein the Candida tropicalis is H5343 

5 ura-. 

89. A method of producing a CYP52A3A protein including an amino acid 
sequence as set forth in SEQ ID NO: 98 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
10 having the amino acid sequence as set forth in SEQ ID NO: 98; and 

b) culturing the cell under conditions favoring the expression of the protein. 

90. The method according to claim 89 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

15 

91. Isolated nucleic acid encoding a CYP52A3B protein having the amino acid 
sequence as set forth in SEQ ID NO: 99. 

92. Isolated nucleic acid comprising a coding region defined by nucleotides 913- 
20 2535 as set forth in SEQ ID NO: 89. 

93. Isolated nucleic acid according to claim 92 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 89. 



25 94. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 99. 

95. A vector comprising a nucleotide sequence encoding CYP52A3B protein 
including an amino acid sequence as set forth in SEQ ID NO: 99. 

30 

96. A vector according to claim 95 wherein the nucleotide sequence is set forth in 
nucleotides 913-2535 of SEQ ID NO: 89. 
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97. A vector according to claim 95 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 



98. A host cell transfected or transformed with the nucleic acid of claim 91 . 

5 

99. A host cell according to claim 98 wherein the host cell is a yeast cell. 

100. A host cell according to claim 99 wherein the yeast cell is a Candida sp. 

10 101. A host cell according to claim 100 wherein the Candida sp. is Candida 

tropical is. 

102. A host cell according to claim 101 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

15 

103. A host cell according to claim 102 wherein the Candida tropicalis is H5343 

ura-. 

104. A method of producing a CYP52A3B protein including an amino acid 
20 sequence as set forth in SEQ ID NO: 99 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; and 

b) culturing the cell under conditions favoring the expression of the protein, 

25 105. The method according to claim 1 04 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

106. Isolated nucleic acid encoding a CYP52A5A protein having the amino acid 
sequence set forth in SEQ ID NO: 100. 

30 

107. Isolated nucleic acid comprising a coding region defined by nucleotides 
1 103-2656 as set forth in SEQ ID NO: 90. 
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108. Isolated nucleic acid according to claim 107 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 90. 

109. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

5 NO: 100. 

1 10. A vector comprising a nucleotide sequence encoding CYP52A5A protein 
including an amino acid sequence as set forth in SEQ ID NO: 100. 

10 1 1 1. A vector according to claim 1 10 wherein the nucleotide sequence is set forth 

in nucleotides 1 103-2656 OF SEQ ID NO: 90. 

1 12. A vector according to claim 110 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

15 

1 13. A host cell transfected or transformed with the nucleic acid of claim 106. 

1 14. A host cell according to claim 1 13 wherein the host cell is a yeast cell. 
20 1 1 5. A host cell according to claim 1 14 wherein the yeast cell is a Candida sp. 

1 16. A host cell according to claim 1 1 5 wherein the Candida sp. is Candida 

tropicalis. 

25 1 17. A host cell according to claim 1 1 6 wherein the Candida tropicalis is 

Candida tropicalis 20336. 

1 18. A host cell according to claim 1 17 wherein the Candida tropicalis is H5343 

ura-. 

30 

119. A method of producing a CYP52A5A protein including an amino acid 
sequence as set forth in SEQ ID NO: 100 comprising: 
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a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; and 

b) culturing the cell under conditions favoring the expression of the protein. 

5 120. The method according to claim 1 19 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

121. Isolated nucleic acid encoding a CYP52A5B protein having the amino acid 
sequence as set forth in SEQ ID NO: 101. 

10 

122. Isolated nucleic acid comprising a coding region defined by nucleotides 
1 142-2695 as set forth in SEQ ID NO: 91 . 

123. Isolated nucleic acid according to claim 122 comprising the nucleotide 
15 sequence as set forth in SEQ ID NO: 9 1 . 

124. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 101. 

20 125. A vector comprising a nucleotide sequence encoding CYP52A5B protein 

including the amino acid sequence as set forth in SEQ ID NO: 101. 

126. A vector according to claim 125 wherein the nucleotide sequence is set forth 
in nucleotides 1 142-2695 of SEQ ID NO: 91. 



25 



30 



127. A vector according to claim 125 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

128. A host cell transfected or transformed with the nucleic acid of claim 121 . 

129. A host cell according to claim 128 wherein the host cell is a yeast cell. 
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1 30. A host cell according to claim 129 wherein the yeast cell is a Candida sp. 

1 3 1 . A host cell according to claim 130 wherein the Candida sp. is Candida 

tropicalis. 

5 

132. A host cell according to claim 131 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

133. A host cell according to claim 132 wherein the Candida tropicalis is H5343 

10 ura-. 

134. A method of producing a CYP52ASB protein including an amino acid 
sequence as set forth in SEQ ID NO: 101 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
15 having the amino acid sequence as set forth in SEQ ID NO: 101; and 

b) culturing the cell under conditions favoring the expression of the protein. 

135. The method according to claim 134 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell 

20 

136. Isolated nucleic acid encoding a CYP52A8A protein having the amino acid 
sequence set forth in SEQ ID NO: 1 02. 

137. Isolated nucleic acid comprising a coding region defined by nucleotides 464- 
25 2002 as set forth in SEQ ID NO: 92. 

138. Isolated nucleic acid according to claim 137 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 92. 

30 139. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 102. 
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140. A vector comprising a nucleotide sequence encoding CYP52A8A protein 
including an amino acid sequence as set forth in SEQ ID NO: 102. 



141. A vector according to claim 140 wherein the nucleotide sequence is set forth 
5 in nucleotides 464-2002 of SEQ ID NO: 92. 

142. A vector according to claim 140 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

10 143. A host cell transfected or transformed with the nucleic acid of claim 136. 

144. A host cell according to claim 143 wherein the host cell is a yeast cell. 

145. A host cell according to claim 144 wherein the yeast cell is a Candida sp. 

15 

146. A host cell according to claim 145 wherein the Candida sp. is Candida 

tropicalis. 

147. A host cell according to claim 146 wherein the Candida tropicalis is 
20 Candida tropicalis 20336. 

148. A host cell according to claim 147 wherein the Candida tropicalis is H5343 

ura-. 



25 149. A method of producing a CYP52A8A protein including an amino acid 

sequence as set forth in SEQ ID NO: 102 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 102; and 

b) culturing the cell under conditions favoring the expression of the protein. 

30 

150. The method according to claim 1 49 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 
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151. Isolated nucleic acid encoding a CYP52A8B protein having the amino acid 
sequence set forth in SEQ ID NO: 103. 

1 52. Isolated nucleic acid comprising a coding region defined by nucleotides 
5 1017-2555 as set forth in SEQ ID NO: 93. 

153. Isolated nucleic acid according to claim 152 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 93. 

10 1 54. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 103. 

1 55. A vector comprising a nucleotide sequence encoding CYP52A8B protein 
including an amino acid sequence as set forth in SEQ ID NO: 103. 

15 

156. A vector according to claim 155 wherein the nucleotide sequence is set forth 
in nucleotides 1017-2555 of SEQ ID NO: 93. 

157. A vector according to claim 155 wherein the vector is selected from the 
20 group consisting of pi as mid, phagemid, phage and cosmid. 

1 58. A host cell transfected or transformed with the nucleic acid of claim 151. 

159. A host cell according to claim 158 wherein the host cell is a yeast cell. 

25 

1 60. A host cell according to claim 1 59 wherein the yeast cell is a Candida sp. 

161. A host cell according to claim 160 wherein the Candida sp. is Candida 

tropicalis. 

30 

162. A host cell according to claim 161 wherein the Candida tropicalis is 
Candida tropicalis 20336. 
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163. A host cell according to claim 162 wherein the Candida tropicalis is H5343 

ura-. 



164. A method of producing a CYP52A8B protein including an amino acid 
5 sequence as set forth in SEQ ID NO: 103 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 03; and 

b) culturing the cell under conditions favoring the expression of the protein. 

10 165. The method according to claim 164 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

1 66. Isolated nucleic acid encoding a CYPS2D4A protein having the amino acid 
sequence set forth in SEQ ID NO: 104. 

15 

167. Isolated nucleic acid comprising a coding region defined by nucleotides 767' 
2266 as set forth in SEQ ID NO: 94. 

168. Isolated nucleic acid according to claim 167 comprising the nucleotide 
20 sequence as set forth in SEQ ID NO: 94. 

169. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 104. 

25 170. A vector comprising a nucleotide sequence encoding CYP52D4A protein 

including an amino acid sequence as set forth in SEQ ID NO: 1 04. 

171. A vector according to claim 170 wherein the nucleotide sequence is set forth 
in nucleotides 767-2266 of SEQ ID NO: 94. 

30 

172. A vector according to claim 170 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 
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173. A host cell transfected or transformed with the nucleic acid of claim 166. 

174. A host cell according to claim 173 wherein the host cell is a yeast cell. 

5 175. A host cell according to claim 174 wherein the yeast cell is a Candida sp. 

176. A host cell according to claim 175 wherein the Candida sp. is Candida 

tropicalis. 

10 1 77. A host cell according to claim 1 76 wherein the Candida tropicalis is 

Candida tropicalis 20336. 

178. A host cell according to claim 177 wherein the Candida tropicalis is H5343 

ura-. 

15 

179. A method of producing a CYP52D4A protein including an amino acid 
sequence as set forth in SEQ ID NO: 104 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 04; and 
20 b) culturing the cell under conditions favoring the expression of the protein. 



180. The method according to claim 179 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

25 1 8 1 . A method for discriminating members of a gene family by quantifying the 

amount of target mRNA in a sample comprising: 

a) providing an organism containing a target gene; 

b) culturing the organism with an organic substrate which causes 
upregulation in the activity of the target gene; 

30 c) obtaining a sample of total RN A from the organism at a first point in 

time; 
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d) combining at least a portion of the sample of the total RNA with a 
known amount of competitor RNA to form an RNA mixture, wherein the competitor RNA is 
substantially similar to the target mRNA but has a lesser number of nucleotides compared to the 
target mRNA; 

5 e) adding reverse transcriptase to the RNA mixture in a quantity sufficient 

to form corresponding target DNA and competitor DNA; 

f) conducting a polymerase chain reaction in the presence of at least one 
primer specific for at least one substantially non-homologous region of the target DNA within the 
gene family, the primer also specific for the competitor DNA; 
10 g) repeating steps (c-f) using increasing amounts of the competitor RNA 

while maintaining a substantially constant amount of target RNA; 

(h) determining the point at which the amount of target DNA is 
substantially equal to the amount of competitor DNA; 

(i) quantifying the results by comparing the ratio of the concentration of 
15 unknown target to the known concentration of competitor; and 

(j) obtaining a sample of total RNA from the organism at another point in 
time and repeating steps (d-i). 

182. A method according to claim 181 wherein the target gene is selected from 
20 the group consisting of a CPR gene and a CYP gene. 

183. A method according to claim 182 wherein the CPR gene is selected from the 
group consisting of a CPRA gene (SEQ ID NO: 81) and a CPRB gene (SEQ ID NO: 82). 

25 184. A method according to claim 182 wherein the CYP gene is selected from the 

group consisting of CYP52A1 A gene (SEQ ID NO: 85), CYP52A2A gene (SEQ ID NO: 86), 
CYPS2A2B gene (SEQ ID NO: 87), CYP52A3A gene (SEQ ID NO: 88), CYP52A3B gene (SEQ 
ID NO. 89), CYP52A5A gene (SEQ ID NO: 90), CYP52A5B gene (SEQ ID NO: 91), CYP 52AM 
gene (SEQ ID NO: 92), CYP52A8B gene (SEQ ID NO: 93) and CYP52D4A gene (SEQ ID NO: 

30 94). 
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185. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CPRA genes; 

b) increasing, in the host cell, the number of CPRA genes which encode a CPRA 
protein having the amino acid sequence as set forth in SEQ ID NO: 83; 

5 c) culturing the host cell in media containing an organic substrate which 

upregulates the CPRA gene, to effect increased production of dicarboxylic acid. 

186. A method for increasing the production of a CPRA protein having an amino 
acid sequence as set forth in SEQ ID NO: 83 comprising: 

10 a) transforming a host cell having a naturally occurring amount of CPRA protein 

with an increased copy number of a CPRA gene that encodes the CPRA protein having the amino 
acid sequence as set forth in SEQ ID NO: 83; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CPRA gene. 

15 

187. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CPRB genes; 

b) increasing, in the host cell, the number of CPRB genes which encode a CPRB 
protein having the amino acid sequence as set forth in SEQ ID NO: 84; 

20 c) culturing the host cell in media containing an organic substrate which 

upregulates the CPRB gene, to effect increased production of dicarboxylic acid. 

188. A method for increasing the production of a CPRB protein having an amino 
acid sequence as set forth in SEQ ID NO: 84 comprising: 

25 a) transforming a host cell having a naturally occurring amount of CPRB protein 

with an increased copy number of a CPRB gene that encodes the CPRB protein having the amino 
acid sequence as set forth in SEQ ID NO: 84; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CPRB gene. 



30 



1 89. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A1A genes; 
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b) increasing, in the host cell, the number of CYP52A1 A genes which encode a 
CYP52A I A protein having the amino acid sequence as set forth in SEQ ID NO: 95; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A2A gene, to effect increased production of dicarboxylic acid. 



190. A method for increasing the production of a CYP52A1A protein having an 
amino acid sequence as set forth in SEQ ID NO: 95 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A1A 
protein with an increased copy number of a CYP52A1A gene that encodes the CYP52A1A protein 

10 having the amino acid sequence as set forth in SEQ ID NO: 95; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A1A gene. 

191 . A method for increasing production of a dicarboxylic acid comprising: 

15 a) providing a host cell having a naturally occurring number of CYP52A2A genes; 

b) increasing, in the host cell, the number of CYP52A2A genes which encode a 
CYP52A2A protein having the amino acid sequence as set forth in SEQ ID NO: 96; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A2A gene, to effect increased production of dicarboxylic acid. 

20 

1 92. A method for increasing the production of a CYP52A2A protein having an 
amino acid sequence as set forth in SEQ ID NO: 96 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A2A 
protein with an increased copy number of a CYP52A2A gene that encodes the CYP52A2A protein 

25 having the amino acid sequence as set forth in SEQ ID NO: 96; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A2A gene. 

193. A method for increasing production of a dicarboxylic acid comprising: 

30 a) providing a host cell having a naturally occurring number of CYP52A2B genes; 

b) increasing, in the host cell, the number of CYP52A2B genes which encode a 
CYP52A2B protein having the amino acid sequence as set forth in SEQ ID NO: 97; 
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c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A2B gene, to effect increased production of dicarboxylic acid. 



1 94. A method for increasing the production of a CYP52A2B protein having an 
5 amino acid sequence as set forth in SEQ ID NO: 97 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A2B 
protein with an increased copy number of a CYP52A2B gene that encodes the CYP52A2B protein 
having the amino acid sequence as set forth in SEQ ID NO: 97; and 

b) culturing the cell and thereby increasing expression of the protein compared 
10 with that of a host cell containing a naturally occurring copy number of the CYP52A2B gene, 

195. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A3 A genes; 

b) increasing, in the host cell, the number of CYP52A3A genes which encode a 
15 CYP52A3A protein having the amino acid sequence as set forth in SEQ ID NO: 98; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A3A gene, to effect increased production of dicarboxylic acid. 

1 96. A method for increasing the production of a CYP52A3A protein having an 
20 amino acid sequence as set forth in SEQ ID NO: 98 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A3A 
protein with an increased copy number of a CYPS2A3A gene that encodes the CYP52A3A protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; and 

b) culturing the cell and thereby increasing expression of the protein compared 
25 with that of a host cell containing a naturally occurring copy number of the CYP52A3A gene. 

197. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A3B genes; 

b) increasing, in the host cell, the number of CYPS2A3B genes which encode a 
30 CYP52A3B protein having the amino acid sequence as set forth in SEQ ID NO: 99; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A3B gene, to effect increased production of dicarboxylic acid. 
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198. A method for increasing the production of a CYP52A3B protein having an 
amino acid sequence as set forth in SEQ ID NO: 99 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A3B 
protein with an increased copy number of a CYP52A3B gene that encodes the CYP52A3B protein 

5 having the amino acid sequence as set forth in SEQ ID NO: 99; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A3B gene. 

199. A method for increasing production of a dicarboxylic acid comprising: 

10 a) providing a host cell having a naturally occurring number of CYP52A5A genes; 

b) increasing, in the host cell, the number of CYP52A5A genes which encode a 
CYPS2A5A protein having the amino acid sequence as set forth in SEQ ID NO: 100; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A5A gene, to effect increased production of dicarboxylic acid. 

15 

200. A method for increasing the production of a CYP52A5A protein having an 
amino acid sequence as set forth in SEQ ID NO: 1 00 comprising: 

a) transforming a host cell having a naturally occurring amount of CYPS2A5A 
protein with an increased copy number of a CYP52A5A gene that encodes the CYP52A5A protein 

20 having the amino acid sequence as set forth in SEQ ID NO: 100; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A5A gene. 

201 . A method for increasing production of a dicarboxylic acid comprising: 

25 a) providing a host cell having a naturally occurring number of CYP52A5B genes; 

b) increasing, in the host cell, the number of CYP52A5B genes which encode a 
CYP52A5B protein having the amino acid sequence as set forth in SEQ ID NO: 101; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A5B gene, to effect increased production of dicarboxylic acid. 

30 

202. A method for increasing the production of a CYP52A5B protein having an 
amino acid sequence as set forth in SEQ ID NO: 101 comprising: 
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a) transforming a host cell having a naturally occurring amount of CYP52A5B 
protein with an increased copy number of a CYP52A5B gene that encodes the CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 101; and 

b) culturing the cell and thereby increasing expression of the protein compared 
5 with that of a host cell containing a naturally occurring copy number of the CYP52A5B gene. 

203. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A8A genes; 

10 b) increasing, in the host cell, the number of CYP52A8A genes which encode a 

CYP52A8A protein having the amino acid sequence as set forth in SEQ ID NO: 102; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A8A gene, to effect increased production of dicarboxylic acid. 

15 204. A method for increasing the production of a CYP52A8A protein having an 

amino acid sequence as set forth in SEQ ID NO: 102 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A8A 

protein with an increased copy number of a CYP52A8A gene that encodes the CYP52A8A protein 

having the amino acid sequence as set forth in SEQ ID NO: 102; and 
20 b) culturing the cell and thereby increasing expression of the protein compared 

with that of a host cell containing a naturally occurring copy number of the CYP52A8A gene. 

205. A method for increasing production of a dicarboxylic acid comprising: 
a) providing a host cell having a naturally occurring number of CYP52A8B genes; 
25 b) increasing, in the host cell, the number of CYP52A8B genes which encode a 

CYP52A8B protein having the amino acid sequence as set forth in SEQ ID NO: 103; 

c) culturing the host cell in media containing an organic substrate which 

upregulates the CYP52A8B gene, to effect increased production of dicarboxylic acid. 

30 206. A method for increasing the production of a CYP52A8B protein having an 

amino acid sequence as set forth in SEQ ID NO: 103 comprising: 
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a) transforming a host cell having a naturally occurring amount of CYP52A8B 
protein with an increased copy number of a CYP52A8B gene that encodes the CYP52A8B protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; and 

b) culturing the cell and thereby increasing expression of the protein compared 
5 with that of a host cell containing a naturally occurring copy number of the CYP52A8B gene. 

207. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52D4A genes; 

b) increasing, in the host cell, the number of CYPS2D4A genes which encode a 
10 CYP52D4A protein having the amino acid sequence as set forth in SEQ ID NO: 104; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYPS2D4A gene, to effect increased production of dicarboxylic acid. 

208. A method for increasing the production of a CYP52D4A protein having an 
15 amino acid sequence as set forth in SEQ ID NO: 104 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52D4A 
protein with an increased copy number of a CYP52D4A gene that encodes the CYP52D4A protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; and 

b) culturing the cell and thereby increasing expression of the protein compared 
20 with that of a host cell containing a naturally occurring copy number of the CYP52D4A gene. 

209. A method for discriminating members of a gene family according to claim 
181 wherein culturing the organism with the organic substrate is accomplished in a fermentor. 
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QC-RT-PCR primers for the 5' coding sequence of 
Candida tropicalis 20ZZS P4SQCYP52A5A 

ATCATTGA^CAACfCCT«5»ATATrCGT«r 0 KblTC R»ttam\, 1' IUI A CATCATCAAA CAACTCCTTGCAIACACAAAGACTCGCGrC 3* 90 
TACTAACTTtyrTG^GGAlCrrATMCCA.TA 00»AC?COCTO«3UClTCTA5rACTTT CTTCAGGAAOGT ATGi^U lCll^ GCCCAfi 5* 

TTGATGAAAAAGtTGGGTGCTGCTrCAffIC ACAAACAASTTGlACGAClAaSaTrcarr A 'l UL» IlAATCXIA7GCAAGCCrcXCACTTC 3* xqq 
AACTACrTTTrC^CCSa^X«SGTOG iumui iUU CATCC lUl ll»U»tAAGCCA TAGCAGT^CCTACCTrCLX&GJ^TCAAG 5' 
forwa rd Pritn«r 7<H»*7y 

AAg 5X5^^C>g^ftg^gg g'AgjC CATTACA ASTl I UX CacrCCAASAACCEA AC (inWGCACCIACtTrClGTATrCTTTTC 3' 270 



7TCTSraCCt;*t*CctiAri i c juions ctaatcttcaaactgstsvc g i iCi i -sost tcgcac ccetggatgcagtcataagaaaag 5- 

(KSaCCAOSATCGrCGTMCaAMATCCA eASAATATCAAA UL ' l A ' l ' Zl HjC OACCOC TnCGTGrTTTITCITnSGCrtASACCCVC 3* 360 

ccGix^rcrrAGaccA uuui nciA CGr cTcmTacma^TAMAcccrroGsrc aaaccaciaaaaacaaacccgttctccgxs 5' 

i t "1L - 1 1 1 1 1 'AAG UL A 1 10 1 rA G GTGATOOS Ain ' lCr U^TrGGAC GGgSAAGGCTGGAAG CAOUSCAGRGC CAIX* l ' lUA GACOtCAGTrT 3* 4 SO 
TGASAAAMTTCCGAAAClATC pACTACG^ TAGAAGTGTAACCr^CCQnTCQIACCTTC CTGrCGTCTCGGlACAACTCTCKu'GTCAAA 5* 



RavtfM Prtnwr 7M1*f 7M 

GCMAGAACAACTTGCTCATGraCGTOS TTQ3AACC^CACTTCCAGTTGTTGAAGAnG CATATTCTTAAOCICAAOSGTGA^TACTTT 3* 540 
COTCTCTTGTr&ACGMTACAOGCAGr AA Oi n GUUlU AAGGTCUlCA Aa ' iCl ' r C GTATAAGAATTCUilf 1' i QCCACTTATGAAA 5* 
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C. tropicalis 2033 6 CPR Allele DNA Alignment of DS Sequence 



CPRA 
CPRB 



CPRA 
CPR3 



CP A3 



CPRA 



CATCA 



1 TATATGATATATCATXTATCTTCCTGTCTAATTATTrtTTCCTATTCCTXAATACTTACTACA* * i * U i I 70 




r *n*Tr^TCTATCGGGATAATTA CGACACCAACATTGCAGAAACACCGTrCGTCACAATCCAAAGA 70 

CPKA 
CPR3 

CWA 20* GTTATGCC^GATCAAA--ACCCG^ ™ 
281 GGTAAAGGATCAACTGATTAGCGGAAGAT7GG7G7TGCC7GTGGGGT7C7T- - 347 

CPW. 274 TTCTrrGCGCAXGTXACATGTGCCAATTTXG— GTGXTTAGCGTGCC-CCAC^^G^T^^ 3« 

w W ™ ™ ~ 

->*i r^rC^lIIIGTCATACCCCAAGTCrrAACTAGCTCCACAGTCrCGACGGTGTCTCGACGAiuxuirwII 412 



C?RA 413 CCACCC«cu^i««^"^—-"— " GCAGTTTATGTTCATO S«B 

C?R3 498 CCACCCCtCGCAMAATCATTCGA^i5i»R^«TCTCCXCC- . j„ . ... 

C?RA 4B3 TrTCTCCCACTTr<WTTGTGATTGG<WTASTC?AGTGA0T7GGAGATTTICTTTTCT }" 

CTRA 553 CGATA7CGAAATT7GATGAATATAGA5ASAA5CCASA7CAGCACA37AGATCGCCTCT^ 623 

C?H3 664 GTTGAACAACAACTAGCTGA^Ti^CrtC^CCftCCGCT *♦•*♦•#♦* 



C??3 7J 



1 ACAGGGTG7CACCGCCAAC7GACGTTG-3-37G3AGT7G 

CK* 783 ACGAA0ACAAG7AGCACAAAACCCAAGC7TAAS« J» 
C?R3 793 A««GAGAAGTAG«^^^ 



8)3 CCATrGATTTACA7AAT--CAACAG-7;^ACAGAAAA^ 

859 C«m»emMe»*»M^^ 928 



SCO ATTACTTCTT7TTT7TCTTCTTTCCTT CTTTCCTTCTGTTTTTC^AC^ATCAGTCT^A 9« 

S29 ATTACTTCrTTrTTICTTTCCTrCCITTCATTTCCTTTCCTTCTG 



953 CTTGTTT7TGCAATTCCTCA7CCTCC7CC7AC7CCTCC7CACCAjrGK "33 

CP23 SgTTTTTGCAATTCCTCATCCTCCTCCT..- CACC^GC777ACAC«GITACA :: IG ::: 1059 

• * 
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CFJUl 1033 ctC^TCXTAACATICGTCCTCCCTCTACCCCCCTXTTTTCCTJUlCWCCACTTCClICATCACCCCCACO 1102 
C?XB 10 60 CTCATCATAXCATTCCTCCTCCCTC-CCCCCCCTATTTTCCTAAC^CCACTTCCTTCATCACCCCCACO 1129 

CPRA 1103 ACACCCGC77CC7CJU^CACCCACAGCGGAAGCJAC7CCAGAGACC^ 11 '2 
C?R3 1130 xCACCGGCTTCCTCAACACGCACACCGCAAGCAACTCCAGAGACCTCTrCCTGACATTGAACA^ 1159 

C?XA 1173 TAAXAACXCGT7CTTCTTCTTTGCCTCCCACACCCGTACCGCAGAAGATTACGCCAACAAATTGTCCACA 1242 
C?XB 1200 TAAA>ACACCT7C-TGTTCTTTCCCTCCCAGACCCGTACGGCAGAACAT7ACGCCAACAAXTTGTCAACA 1269 

r?av 1243 CAATTGCAC7CCAGATTTGGCT7GAXAACGA7CCT7CCAGA7T7CGCT 1312 

CPXfl 1270 CAATTCCACTCCACATTTCGCTTGAAAACCATCGT^^ 1339 

rsnA 1313 CXGATA7CACCGAAGACATCT7GC7G777TTCATTC7TGCCACCTATCG7CACGG7GAACC7ACCCA7AA 1382 

C?X3 1340 CACATA7CACCCAAGATA7C77CC7G777T7CA7CC7TGCCACCTACCS7GAGGG7GAACC7ACCGACAA 1409 

C?RA 13 83 JGCCGACGACT7CCACACC7GG77CAC7GAAGAAGC7CACAC777GAC7ACC77GAAATACACCGTC77C 1452 

C?RS 1410 TGCCGXCGAGWCCACACC7CG77CAC7CAAGAAGC7GACAC777CAC7ACT77GAGA7ATACCCTC77C 1479 

CPXX 1453 GGGT7CGG7;JlC7CCACG7ACCAC77C77CAA7CCCA77CG7ACJUUkGT7TGACA6Ar7GT7CAGCCACA. 1522 

C?*3 K80 CGG7TCGC7AAC7CCACCTACGAC77C77IAA7CC7A77GC7ACAAAC777GACAGA7TGT7CAGTCAGA 1549 

CS3A 1523 AACC7GG7CAC^CC77TGC7CAA7ACCC7CAAGG7CA7GACGG7AC7CCCACC77GGACGAAGATTTCA7 1592 

CS33 1550 AAGC7GC7GACACAT77CC7GAA7A7CC7GAACC7CACCACCGCAC7GGCACC77GGACGAAGA777CA7 1615 

CPSA 1593 GGCCTCGAAGGACAA7G7C777GACGCC77GAAGAA7GA777GAAC77TGAA 1662 

C?*3 1620 CGCC7GGAAGGA7AA7G7C7T7C^CGCC7TGAAGAATCAC77GAAC777GAACAAAACaAA77GAAC7AC 1689 

CP*A 15 53 GAACCAAACG7GAAA77GAC7GAGAGACACGAC77C7C7GC7GC7GAC7CCCAAC77rCCX7CCGTGAGC 1732* 

cy?3 1SS0 CAAC cAAACG7GAAA77CAC7CAGAGA3A7GACrTC7C7CC7GCCCAC7CCCAAG777CCT7CSG7C^ 1759 

CPXA 1733 CAAACAAGAAG7A:^7CAACTCCGAGG3CA?CCACT7CACCJ^^ l' 02 

C?X3 1750 CAAACAAGAAG7ACA7CJU^C7CCGAGGCCA7CGAC77^ iB25 

C?RA 1*03 C77CCCCAGAA7C^CCCAGACGAGAGA377G77CACC7CCAACCACAGACAC7G7A7CCACG77GAA777 1872 

C?»3 153 0 C77CGCCXGCA7CACCGAGACCAGAGAG77G77CACC7CCAAGCAAAGACAC7C7A77CACG*aGAA7i* 1899 

C??A 1373 CXCA 777C7GAA7CGJUlC775AAA7ACACe\C^^ 1542 

C?R3 I SCO CACA7T7C7GAA7C(^C776AAA7ACACCACCG5tGACCX7C7AGCCA7CTCGCCATCCAACXCCCACG 19S5 

C?XA 1513 AAAA CA77AA G CAAX7 7 G C CAAG7 G 7 77 CGGA77 G GAACA7 AAA C7 CCA CAC7G7 7 A77 GAA77GAAGGC 2 012 

C?33 157 0 ;jUUkCX7C\AGCAA7"GCCAAG7G777CGGA77GGAACA7AAAC7CGACAC 2 03 9 

C?RA 2013 G77GGACTCCAC77ACACCA7CCCA77CCCAACCCCAA^ 2082 

C?R3 204 0 A77GCAC?CCAC77ACACCA77CCA77CCCAAC7CCAA77A 2109 

CPRA 2C33 GAAA7C7CCGC7CCAC7CTCGAGACAAr7C77777G7CAA77CC7CC5-77GC7CC7CA7CAACA 2152 

C?*3 2110 GAAATCTCCGG7CCAC7C7CCACACAA77C77777G7CC^77GC7GGG77TGC7CC7GA7GAACAAACAA 2179 
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CFRA 3 13 J TTTTTTTTTTATTATTATATTACGWACA^^ 

Jill m ATTATATTMS^ttTACXrrCAACTATXTATACTTWTT^TCTAT^^C^ 

rt>« 3203 t as CTATTATC7 AC7CCTCT ACT7 CTTTCGCArTCACATCA^CATTACCGTT CCC^^ACCG^GCCG^ 3272 

CF3A 
C??3 
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Cpra 3413 GCAACAACTrCTGGCCATCCAaJATATACACGTTATTCAC^ 3482 
CPRB 3425 CCAACAACTTCTCCCCATCCAC^TATACACOT'ATTCACCTTATTATGCOACCTATCCATATCCTTATC 3495 ' 

CPRA 3483 CTTATTCAACTTCTCAAACTTCJtAAAACAACC^ 3552 

CPRB 3495 OTATOAACTTCTCAAACTTC*AAAACAACCCCACGTCC^ 3S6S 

CPRA 35S3 CTCACGTCCTCCCACCTCCTCAACTI^CAATTACATCC TTCTTCTTATTCATCTTCTCCTACTTTCTCA 3622 

CPRB 3555 CTCACGTCCTCCCACCTCCTCAACTTCTCAATTAGATCCTTCTTCTTATrCATCTXCT 3535 

CPRA 3 5 2 3 • ATTCCTGGMCACATTGTCCTCGTTGTrCAAATAGATCTTCAACAA Cn riT CAACGCGATCAACTTCTC 3 $p2 

CPRB 3535 ACTGCTGGAACACATTGTCCTCGTTCTTCAAATAW 3705 

♦ •*♦♦*** •*•#♦♦*******♦♦♦•♦**••**•****♦♦*♦***♦♦♦* ••*#* *• • t »» 

CPRA 3593 AATCTGCGCCAAGATCTCCGCCGGGATCTTCAGAAACUGTCCTGC^ 3752 

CPR3 3706 CATCTGGGCCAAC^TTTCCGCC<WGATCTT<^GAAACAAGTCCTGC^ 3775 

CPRA 3753 TACAACAAGTCCAAGGSGCAGAAGTGTCTAGGCACWGm 3832 

CPR3 377S TACAACAAGTCTAAWGCAGAAGTGTCTAGGCACGTGr^ 3845 

CPRA 3833 ACTTCGAGTTATAGTTATCGTACAACCAT7TTGGTTO 3902 

CPR3 3845 AGTTCGAGTTATAGTrATCCTACAACCACTTTGCCra 391S 

CPRA 3303 CTCCTGGTTCCTCTCATAGTACAACTKCACTXCTTCGAG^ 3972 

CPR3 3 PIS CTCCTGGrTCCTITCATAGTACAACTGGCAm 3585 

CPRA 3973 ATATTCGGCAACAAGAGCCCOTACCCCTCACG^ 4042 

CPR3 39S6 ATATTCXTCCAACAAGAGCCCGTAGCGCICAC^ 4 OSS 

CPRA 4043 TGAAGTCCGAGOTCAAGACAATC^CTGGATGTCGATGATCTGGTGCG 4112 

CPR3 4056 TGAAGTCCGAIGTCAAGACAAtCAACTfrSASGTCSATGATCTGGTC^ 4125 

CPRA 4113 CTCGATGAAGTCGTAC^CTCACACGTCGAGATATACTOT 4182 

CPR3 4125 CTCGATGAAGTCGTACAACT 414S 

CPRA 4183 AGCT7GXGC77CAAG7AGTCG7TG 4205 
CPR3 414S 4145 
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CPRA MMJSKLDLYVIITIWAVAAY^^ . 50 

CPR3 MALDKLDIWI I TLVVAVAA3T FAKNQFXDQPQD 7GHXNID SCS NSRDVLLTLKKNKKNTIi 60 



CPRA LLFG S QTGTA2DYAKKX S R2 LK SRTGLKIHV3^7>JDyDWDNFGD I T2D ILVrTIVATYGS •. 120 

CPR3 LLFGSQTGTAEDyAin<LS R3LHSRTGIiKS1VM?ADYDWDNE'GI) I7SD ILVFFIVATXGS 120 

* 

CPKA G3 PTDNAD2FKTWM2S ADTLS TLXX TVFGLGNS FF^LMGRKFDKLLSaKGGDHFAS 180 

CPR3 G2 P TD2>ZAI>2 FH TWLT2 Z AD TL S TZZ FrK^I GKXFDRLL S2XGGDRFAE 180 



CP3A YAEGDDGTGTI£2D2HAK<DNV^^ 24 0 

CPR3 YAEGDDGSGSICSDFblARKDNV^ 240 

CPRA GS PNXKY IKS 2 G IDLTXG ?FDH 2K ?¥LA3 1 72 TK2LFS S XDJUHC IHV2 FD I S 2 S NLKY XT 300 

C?R3 G2Ftf:<K*INS2GIDIi2XG?5DKTK^ 300 



CPSA GDHIAXOTSNSD3HXKQ?AKCFGI3DKID 3 60 

C?R3 :• 'GDKZAIW3StfSD2NIKQ?AKC?GIM^ ' 3 SO 



CP3A I SGPVSSQFTLS IAGFAPD2S 7K<A=^GGDXQ2 ?>^XV7^<rNIADALLYS SSNAPH / 420 
CPR3 I SGPVSKQFFLS IAGFA5D2S TKX2FTHLGGDKQ2 FATKVTRKO? IAD ALLY S StfOTFrf * 420 



CPRA SDV5F2FL23*^KLTPiCfYSISSSSLS2KQ^ • 480 

C?33 SDV?F2FLI2NIQrXT?KrySISSSSLS2KQLIKvTA^ 480 

CPHA V3 IVQNXTG2 KPLVHiDLS GPRG3G^i^<LFVKVR3lSNFK!jPKKS TTPV^LIGPGTGV^ , t 540 

CPK3 ISIAQtC<TG3K?LVKCTLSGPKC<F^ 540 



+ a 

. CPRA 'iaGFVHSKVQQVKNGV^GXTLI^ 600 
* CPK3 I^GJVaSKVQQVKNGVNVG:^ 600 



CPKA DPSXKVrVQDXIL2NSQLVH2Ll72G« ..660 

CP33 • DPS rGCSTjTVQDXIISHSQLVTSLLTZ GAI Xi"VCOAS?>GSDVQX T I SXXVAXS33 I S2DK 660 



CPHA AA2LVKSVtKVQiG0f Q2 DVW 680 

CPH3 AA2LVKS7C<V-Q^i"Q2DVrt 680 
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C. tropkalls 20336 CYP52 DNA Alignment of DS Sequence 



C7F52A1A 1 0 

OT52A2A 1 GACCTGTGACGCTTCCGGTGTCTTGCCACCAGTCTCCAAGTIGACCGAC^ 70 

C7P52A33 1 . 0 

CTP52A3A 1 CACATCATAAT 11 

CYF52A33 1 0 

CYPS2A5A 1 0 

CYP52A53 1 TTACAATCATGQ 12 

CYF52A8A l" 0 

CYP52AS3 1 0 

CYP52D4A 1 0 

C7P52A1A 1 CATATG CGC7AA7 C I I C ' l XI WCi aIX i A ? CACAGC AGAAACTA7CCCAC CCCCAC77 C 59 

CYP52A2A 71 A777CCGGT7ACAC77CCAAGA7GG CTGG7AC7GAAGAAGG7XI7CACGGAACCACAAGCTAC777C7CC3 140 

OT52A23 1 0 

CYPS2A3A 12 GACCCGGTTATTTCCCCCTCACGTrGCTTATTTGAGCCGT;^^ 81 

CY752A33 1 0 

CYPS2ASA 1 7GGAG7C 7 

CYP52AS3 13 AGCTCGCTAGGAACCCAGATGTCTGGGAGAAGCTCCGCGAAGAG^ 82 

CYP52A8A 1 0 

CYPS2A83 1 0 

CYPS2D4A 1 0 



CYP52A1A €0 CAAACACAATGACAACTCCTGCGrAACTTGCAAAnCTrGTCTCACTAA 129 

CYF52A2A 141 C77G777CGG7C^CCA77C77GG7GTrCCACCCAATGA*G^^ 210 

CYP52A23 1 GCTCAACAATTGTCTGACAAGATC7C 25 

CYP52A3A 82 AAAC7C7AG7A7AA7GG7GA7AAC7GC77GCAC7C77GCCA7AGGCA7CAAA^^ 151 

CYP52A33 1 - 0 

CY7S2A5A 8 GCCA<^C77GC7CAC7777GAC7CCCT7CCAAAC7CAAAG7ACG77CA^ 77 

CYPS2A53 83 GCCAGAC77GC7CACX7r7CAC7C7CT7AGAAGC7CA^ 1S2 

CV752ASA 1 * * • . ■ . • 0 

CYP52AS3 1 AAAACCGA7ACAAGAAGAAGACAG7CAA 28' 

CY552D4A 1 0 



CY752A1A 130 GACC7CCAG7CAAACGGACAGACA6ACAAACACTrGG7GCCA7G77CATACC^ 

CYP52A2A 211 GCAACACAAGGC7AACCCC7GG77G77GAACACCGG77GvG77C< ; * I C77 C7GC7GCTAGAGGXGG7AAO 

CY552A23 27 GC^CACAAGGCTAAC0CC7Gg77G77GAACAC7C<i77G^377G ^ \ I CTI C7G C7GC7AGAGG7GG7AAG 

CY752A3A 152 A7A777AA7AAGCG7AG<»AG7A7AGCA7CCA7A7CACCG v : C7A7A 1 ¥ 77 7AAGA7AA7C7C7AC7 

CY752A33 1 CC7GCAGA 

Cf?52ASA 78 CG7ATC7ACCCC<K5G^7ACCACGAAACA7GAAGACAG - - C7ACG7GCAACACGACG7TCCCACX3CGGAGG 

CYP52A53 151 CG7A7C7ACCCGCCG<;7GCCACGAAACA7GAAGACAG--C7ACG^ 

CY752ASA 1 

CY752AS3 29 CAAGAACGT7AA7G7CAACCAGG CG CCAAGAAGACGG - - 777GGCGG AC77GGAAG AATG7GGCA777GC 

CYP5204A 1 



199 
2S0 
55 
221 
8 

145 
220 

0 
5* 

0 



CY7S2A1A 200 7GTTAGArGACG^TTTCT7GCAXXGAC-AGG?G77GGCATC 258 

CY?52A2A 281 AGA7GC7CA77CAAG7ACACCACyvGCCA7TTrG<?ACCC7A7C^ 350 

CY752A23 97 AGA7G77CA77GAAC7ACACCAGAGCCA71- i 1 0 <&CGC7ATCCAC7C7G<rrCAA7Tg7CCAAG G77CA A7 . 158 

CY752A3A 222 AAA7T77G7A77C7CAG7AGCA77^CATCAAA77TCGC^CCAA77C7GGCGAAAAAA 291 

CYP52A33 9 A77CGCCGCCG CGTCGACAGAG7AGCAGT7ATGCAAG CATtS7GAT7GTGG7TTTXGCAACC7G777G CAC 78 

CY752A5A 145 AG<5CA»AAGACGCCAAG<*AACC7A7C7-TGG7GCAGAAGG<*ACAGTCCG7^ 213 

CT?S2A53 221 AGCCA-AAGACGCiAAGGAACC7A77T-7GOT;CAGAAGCGCCAG7CCG7T« 288 

C7752ASA 1 0 

CY5S2A53 97 CCA7G-A7C7TrA7G77C7GGACACG7-7TwCAAGGAA7CG-CA7CC7«^ 164 

C*752E4A 1 0 
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CYP32A1A 269 ACTTCTCCTTTA0GCAATAGAAAAAGACTAACAGAACAGCG7rTTTA^ .33 S 

CYP52A2A 351 A CGAAA CTTT CCCACTCTTCAACTTCAATCTCCCAACCTCCTG 7CCAGGTGTCCCAAGTGAAA7CTTPAA 420 

CYP52A23' 257 ACGAGACTTTCCCAGTCTTCAACTTGAATGTCCCAACCTCCTGCCCAGGTGTCCCAAGTGAMTCrr 235 

CVPS2A3A 2S2 GTCAAAAGCTGA*ATAGTGCAGTTTAAACCACC7AAAATCACATATACAGCCTCTAGATACGACAGAGAA 350 

CYP52A3B 75 GACAAATGATCG -ACAGT- CGA7T- • ACGTAATCCXTATTATTTAGAGGCGTAATAAAAAATAAATGGCA 144 

CYP32A5A 214 CGCAGACGGACCCAGAGTATTTTGGGGCCGACGCTGGTGAdTTTAAGCCGGAGAGATGCTrTGATrCA** 281 

CYP52A5B 289 CGCAGACGGACCCAGAGTATTTTGGGGCAGATGCTGGTGAGTTCAAACCGGAGAGATGGTTTCATTCA** 355 

CYP52A6A 1 0 

CYP52ABB ItfS GT7AACGAGATCCA7AT7CACAACCCACCGCAAC-G7CAC^7GC?CM 232 

CYP52D4A. 1 0 



* • * 

CYP52A1A 339 ATTTTTTTAGTCCCACCATTCTCTGGG17GCTC7GGGT^ 4 OS 

CYP52A2A 421 CC CAACCAAGG CC7GGACCGG * AAGG7G77GAC7CC77CAACAAGGAAATCAAGTCTT7GGCTGGTAAGT 489 

CYP52A23 237 CCCAACCAAGG CCTGGACCG - *AAGGTGTTGACTCCTTCAACAAGGAAATCAAGTCTTTGGC7GGTAAGT 304 

CYP52A3A 361 GC7CT77ATGATC7GAAGAAGCATTAGAATAGCT* - -AC7A7GAGCCAC7A7TGGTG7A7A7ATTAGGGA 427 

CYF52A33 145 GCC AGAATTTCAAJ^CAOTTCCAAAC^TGCAAAAGATGAGAAACTCCAACAGA 210 

C7752A5A 282 AGCATCAAGAAC77GGGGTG7AAATAC77GCCG~CJtt7GC7GGSCCACGGAOT^ 351 

CYPS2A53 357 AG CATGAAG AAC77GGCG7G7AACTAC77G CCG77CAA7GCTGGGCCCCG GACTTG7T7GGGCCACCAGT 425 

CYP52A8A 1 0 

CYP52A83 233 ACCCCCACAAGAACAG7GGAATAATGCCAGTCAX-CAAAGAG7GGTCACAGACGAGGGAGAAAACGCAA0 301 

CYP52D4A 1 0 



CYP52A1A 409 TTCAGATGGAAGAACAAAGAGATAAAAAACAAAJJUAAACTCAGTTrTGCA 474 

CYP52A2A' 490 77GC7GAAAAC- •T7CAAGACC7A7GC7GACCAAGC7ACCGC7GA- •AG7GAGAGCTGCAGG7CCAGAA0 555 

CYP52A23 305 77GC7GAAAAC* *TrCAAGACCTA7GC7GACCAAGC7ACCGC7GA* *AG77AGAGCTGCAGG7CCAGAAG 370 

CYP52A3A 428 77GG7GCAAT7AAGTACGTACTAATAAACAGAAGAAAATACT7AACCAAT7TCT 497 

CY?52A33 211 ACTCCGCAGC- •AC7CCGAACCAACAAAACAA7GGGGGGCGCCAG* -AA7TA7TGAC- • -TATT 267 

CYPS2A5A 352 ACAC777GATTGX\GCGAGCTAC77GCTAGTCCC<rrTGCCCCXGACCTAC" CGGGCAA7AGA7TTG 416 

CYJ52A53 427 ACACTT7GA77GAAGCGAGCTATTTGCTAG7CAGG77GGCGCAGACC7AC-CGGC7A^^ 491 

CYP52ASA 1 0 

CY752A83 302 CAACAG 7GGT7C7GA7GCAAGA7CAGC7ACAC CGCTTCA7CAGG AAAAG C-AGGAGC7CCCACCAC 366 

CYPS2D4A 1 GA7G7GGTGC77GA7T7C7CCACAC;CA7CC7TG7GAGGTGCCA7GAATC7G7ACCTG 58 

. CYP52A1A 475 -A7GATATCA7CCAC7CGC7AAACGAA7CA7G7GGG7GA7C77C7C777AG77 543 

CVPS2A2A 555 C77AAAGATA77^A77CA77A777AG777GCC7AX77AOTCTCAT7ACCCA^^ 624. 

CYP52A33 3 71 C77AAAGA7A777A77CAC7AT77AGT7TGCC7A777AmCTCA7CACCCA^ 439 

CYP52A3A 493 * 7GAGGGACC7777C7GAACAT7CGGG 7 CAAA C * i i n ' ii i GGAGiGCGACATCGA' A I \ i V CGTTTGTGT 555 

CYP52AJ3 258 G7GAC H 17 1 1 2 1 Ai ik * 1 * CCG77AA- - C777CA77G CAG7G AAG7GT- - G77ACACGGGG7GCT 329 

CY752A5A 417 -CAGCCAG<5ATCGCCG7ACC*CACCAAGAAAGAAG7CGT7CA7CAACA7GAGTC 4 84 ' 

CY752AS3 492 - C7GCCAGGG7CGGCGTACC- CACCAAGAAAGAAG7CG77GA7CAA7A7G AS7GCTGCCGA7GGGG7GGT 559 

CY752A9A 1 0, 

CY752AS3 357 - CA7A7CCCCA7CACGAGCAACACCAGCAGG77A5TG7ATAG7AG7C7G7AG7TAAG7CAA7 435 

CY752D4A 59 -7CrG7AAGCACAGGGAAC7GC77CAAaXCr7AT7CCA7A77Cr^ 127 



. C7PS2A1A 544 ACA7GA.^C7CAAA7CCAAA-7ACAC7ACAC7CCC^7A77G7CC77CG77m 612 

CYP52A2A 625 A7A7AAAGT7AC77CGCA 7A7CAT7G7AA7CG7G CG7G7CG GAATTGGA7GA777GCAA 663 

CYP52A23 440 A7A7AAAG7TA777CGCAAC-7CA7A---7A7CAnG7AA7CG7GCG7C77CC^77CC<;7AAT77GAAA 505 

CV752A3A 5 57 AATAA7AG7GAACCTr7G7G-7AA7AAA7C77CA7GCAAGAC77Ctt7AATrCGAGCT^ 635 

Cx?S2A33 330 GA7GG7GT7GG7T7C7ACAA-7GCAAGGGCACAG77GAACCT77CCACA7AACGT *7G CACCA7A7CAAC 397 

CY252A5 A 4*5 7GT- - AAAGC7TXA7AAGGA-7G7AACGG7ACA7GCA7AC77G7G7AGGAGGAGCGGAGA7AAArrAGAT 551 

CYP52A53 560 7GT- -AAAG7T*CACAAGGA-7C7AGA7GGA"A7G7A*AGG7G7G7AGGAGGAGCGGAGA7AAAT7AGAT 625 

CY7S2A8A 1 . 0 

CY752A83 436 CC^- -A7AAGAC7ATCCCTT-CT7ACAACCAAG77TTC7GCCGCGCCrG7C7GGCA^ 501 

CYP5204A 123 GA7A7CTGCCAAGG7ATATAGCAGA^CG7GC7GA7GGTrCC7CCGG7CA7ATTCTG7TGG7AG7TC7GCA 197 
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CYF52A1A 

CYP53A3A 

CY952A2B* 

CYPS2A3A 

CYP52A3B 

CYPS2A5A 

CYP52ASB 

CYP52A8A 

CY?52A8B 

OT5204A 



CYF52A1A 
CYF53A2A 
CYFS2A23 
CYF52A3A 
CYF52A33 
C7352A5A 
CY752A5B 
CYT32A8A 
CY552AB3 
C7752D4A 



265 ACT7C7CCTTTAGGCAATAG AAAAAGACTAAGAG AACAG CGTTTTTACACGTTC CATTCCTTAATCTACT .3 3 ft 

351 ACGAAACXTTCCCACTCTTCAACTTCAATCTCCCAXCCTCCTCTCCAGCTCn'CCCAAGTGXAATC il^AA 420 

167 ACGACACTrrCCCACTCTTCAACTTCAATCTCCCAACCTCCTCCCCACCTC^ 236 

292 G TCAAAAGCTG A - ATACTG CAGTT7AAAG CACC7 AAAATCACATATACAG C CT C7AC AT ACQ ACAG ACAA 360 

79 G ACAAATGATCG - ACAGT- CCATT- - ACGTAATCCATATTATTTAGAGGCGTAATAAAAAATAAATGGCX 144 

214 CGCAGACGGACCCAGAGTATmCGGCCCC^icGCTGG7Cjtfm 281 

289 CGCAGACGGACCCAGAGTATTTTCGCGCACIATGCTGCTCMTTCAMCCG^ 356 

165 GT7AACGAGATCCATA7TCACAACCCACCCCAAGCTGACAATGCT 232 
1 



339 A rTTTlTTAG7CCCAGCA77CTGTCGGT7CCT^ 

< 21 CCCAACCAAGG CCTGG ACCGG - AAGCTGTTTGACTCCTT/CAACAAGGAAATCAAG TCTT7GCCTGGTAAGT 
3*7 CCCAACCAAGG CCTGG ACCO - • AAGGTGTTG ACTCCTTCAACAAGGAAA7CAAGTCTT7GGC7GGTAAGT 
\ ' g cTCTrrATGATCTGAAGAAGCATTAGAATAGCT* - -ACTATCAGCCACTA7TGGTC7ATATATTAGGGA 
14S GCC- - * -AGAATTTO^AACA7TXWCAAACAATGCAAAAGATGAGAAAC7CCAA 
282 AG CA7GAAGAACT7GGGGTG7AAATAC77G CCG77CAATG CXGCG CCACGGAC7TGC77GGGGCAGCAGT 
357 AGCATGAAGAACTTGGGGTGTAACTACT7GCCGT7CAA7GCTGW 

233 ACC CC CACAAGAACAGTGGAATAATG CCAG TCAA- CAAAGAG7GG7GACAGACGAGGG AGAAAACGCAAG 
1 



408 
489 

304 
427 
210 
351 
426 
0 

301 
0 



CYP52A1A 
CYP52A2A 
CYP52A23 
CY752A3A 
CY752A33 
CTC52A5A 
CY752A53 
CYP52ABA 
CY552A53 



4 09 7TCAGATGGAAGAACAAAGAGATAAAAAACAAAA?AAAACTGAGTTT^ 

4 90 77GC7GAAAAC- -T7CAAGACCTATGCTGACCAAGCTACCGCTGA- • AG7GAGAG CTGCAGG7CCAGAAG 
s ^QCTCAAAAC- -T7CAAGACC7ATGC7CACCAAGCTACCGCTGA- - AGT7 AG AG CTGCAGj i CCAGAAG 
4 2 8 TTGGTGCAATTAAGTACGTACTAATAAACAGAAGAAAATACTrAACCAATrTCT 

2 11 AC7CCGCAGC- - ACTCCGAACCAACAAAACAATGGGGGGCGCCAC- - AAT7ATTG AC- - -TAT? 

352 AC*CT77GA7TGAAGOTAGCTACT7GCTAG7CCG^ 

427 ACACTT7G A7TG AAG CG AGCTATTTGCTAGTCAGGTTIKiCGCAGACCTAC- CGGGTAA7CGATT7G 

3 02 CAACAGTCGTTCTGA7GCAAC»TCJ^C7ACACCGCTTCA7C^ 

2 . GATGTGG7GC7TGAmcrCGAGACACA7CCrrC7GAGGTGCCATGAATC7G7ACCTG"-- 



474 
555 
370 
497 
267 
416 
491 
0 

366 
58 



CT752A1A 
CYJS2A2A 
07S2A23 
CY752A3A 
Of? 52 A3 3 
OT52A5A 
CY752A53 
CY752A9A 
CV752AS3 
CY752D4A 



475 
SSi 
371 
493 
253 
417 
492 
1 

357 
59 



-ATGATATCATCCACTCGCTAAACGAATCATGTGwTGAT 

C77AAAGATA77TAT7CA7TA7TTAG777GCCTA777A7TTC7CAT7ACC CA7C - A7 C^7^CAACAC7A7 
C77AAACA7A777A7TCACTA7T7AG777CCC7A777A:TT^ 

- 7GAGC-0ACC7T77C7GAACAT7CGGG TCAAAC7777Tr7TGGAG7GCGACA7CG A a a * * aCGTT7G7G* 
G7GA C 1"! 4 1 1 1 i I A T777T7 CCGT7AA- » C77TCAT7GCAGTGAAG7 GT - - GT7ACACGGGG * GGT 

- CAGCCAGGA7CGGCG7ACC* CACCAAGAAAGAAGTO*7TGA7CAACATGAG7^C7GCCGACC^GG^G7T 
-C7GCCAGGG7CGGCG7ACC-CACCAAGAAAG;kAG-CGTrGA7C^ 

-CA7A7GCCCA7CACGAGCAACACCACCAGG7TA37S7A7AG7AGTW 
-7CTG7AAGC^CAC<XiAAC7GCrrCAACACCT7A77GCA7ATrCT 



543 

624. 
439 
556 
329 
464 
559 
0. 
435 
127 



CWS2A1A 
CY?32A2A 
CY?S2A23 
CY732A3A 
CY752A33 
CY?52A5A 
CVP52A53 
CY752A8A 
CY7S2A83 
CYP52D4A 



544 ACA7GAAAC7CAAA7CCAAA-7AttCTACAC7CCCXCTA77CTCC77CC77^ 

623 A7A7AAAGT7ACT7CGGA 7A7CArrG7AATCG7GCG7G7CGCAA77GGA7GA^GGAA 

44 0 A7A7AAAGTTATT7CGGAAC- 7CA7A- - -7A7CA77CTAA7CC7CCC7G77GCAA77^^^GAAA 
567 AATAA7AG7G AACC777G7G -7AA7AAATC7TCA7G^AAGAC77GCA7AA7TCGAG CTTGXmaGTTC^CG 

3 3 0 CA7GG7GTOG777C7ACAA-TGCAACGCCACAGTTGAAW 

435 7GT- - AAAGCTT7ATAAGC A- 7G7 AACGC7AGA7GGA7AG T7 G7G7AGG AGG AGCGGAGA7AAA77«GAT 
5 50 7G7- - AAAGTT7CACAAGGA-7CTAGATGGA7ATGTA* AGGTGTGTAGGAGGAGCGGAGATAAA77ACAT 

4 3 6 CCA- -A7AAGACTA7CCCTT- CTTACAACCAAG7TTTC7GCCGCCCCTG7CTCGCA-ACAGA7GC7GGCC 
12 8 GATA7C7G CCAAGGTATATAG CAGAA CG7GC7GATCG77CC7CCCG7CA7 AT7C7G77GG7AGTXC7G CA 



612 
683 
505 
635 
397 
551 
625 
0 

501 
197 
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CYP52A1A 
CYP52A2A 
CYP52A2B 
CYPS2A3A 
CYP52A3B 
OfPS2A5A 
CYPS2A5B 
CYF52A8A 
CYP52A83 
CYP52D4A 



613 TTACTTTTGACCTCXTAGCAGTTCCCTCTCACXGATCAC^CACATTATCACACTCACAT^ 

6 B 4 CTCCGCTTCAAACCCATTCATCCACCAACCGCACA • TAAAAGATTACCT- - - AATTTATCTCCTGA3ACA 

506 CT GTAGTTGGAACGGATTCATGCACGATGCGCAGA • TAACACG AG ATTATCT C CTAAG.\CA 

636 C— CAATTTGACCTCGTTCATGTGATAAAAGAAAAGCCAAAAGGTAATT- - -AGCAGACGC- - •AATGGO 

398 T - - CAATTT ATCCTCATTCATG TG A7AAAAGAAC AG CCAAAAGG TAATT * - -GGCACACCCCCCAAGGGO 

552 TTCATTTTG - * - TGTAAGGTTTTG GATGTCAACCTACTCCG CACTTCATGCA- GTGTGTGTGACACAAGO 

626 TTGATTTTG - - *TGTAAGGTTT AG CACGTCAAG CTAC7CCG CACTTTGT CTGTACGOAGCACA— 

502 GACACACTT- * •TCAACTGAGTITGG1CTAGAATTCTTGCACATGCACGACA- AGG AAACTCTTACAAAG 
19 B GGTAAAirrGGATGTttGGTAGTGGAGGGAGGTTTGTATCGGTTGTGTT-TrCTTCTrCCTCTCT 



682 
749 
SfS 
697 
462 
617 
CIS. 
0 

567 
266 



CVPS2A1A 
CYP52A2A 
CYPS2A2B 
CYP52A3A 
CYP52A3B 
CY?52A5A 
CYPS2A53 
CY752A8A 
CYP52A83 
CYP52D4A 



683 TCCTATCTCATGCTGTGTGTCTCTGGTTGGTTCATGAGTT^ 

750 ATTTrAGCCGTGTrCACACGCCCTrCTrTGTT- C7GAGCGAAGGAT- - AAATAATTAG ACTTCCACAGCT 

566 ATT7WCCTCAT7CACACGCCC77C7T CTGAGCTAAGGAT- - AAATAATTAG ACT7CACAAGTT 

698 AACATGGAGTGGAAAG CAATGGAAG CACG CCC- AGGACGGAG7AAT7T AG7CCACACTACATC7GG3GGT 
K 63 AACACCCAGTAG AAAG CAATGG AAACACG CCC- ATG ACAGTGCCATTTAGCCCACAACACATCTAGTATT 
618 GTG TAC7ACGTG70CGXGTCCGCCAAGAGACA- - -GCCCAAGGGGG - - TGGT AGTGT- G7G7TGG CGGAA 

6 8 6 - - -TACTCCG7CTGCGCC7CTGCCAAGACACG- - - GCCCAGGGG TAGTGT- GTGCTGGTGGAA 

1 GAATTCTTTCGAT C7AA77CCAGCTGA7C- — TTGC7AATCCT- -TATCAACGTAGTTGTGATCATT 
568 - - ACAACACTTG7GCTCTCA7GCCACTTGATC- - -77GCTAACCC7- -TA7CAACG7AAT7GAGATCATT 
267 XTXCAACCTCCACGTCTCCTTCGGGTlCTCTGTCTGTGTCrGAGTC- - GTACTGTTGG ATTAAG7CCATC 
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7 S 1 GGAAAGCAAAGCTAACTAAA' i i t i C ' i I I G I CACAGG7ACACTAACCTGTAAAACTTCACTGCCACGCCAQ 

8 17 CATTCTAATTTCCGT- - -CACGCGAATATTGAA GGGGGGTACATGTGGCCG C7GAA* 

629 CA7TAAAATATCCCT- - - CACC CGAAAAC7GCAACAA7AAGC AACGGGGGGGTAG ACGTAG CCGATGAA- 

7S7 7TTTTT7T7GTG CGCAAG7ACACAC CTGGAC7 - TrAGTTTTTG CCCCATAAAGTTAACAATCTAA» 

532 C I i i\ i i i a i I TGTC CGCAGGTGCACACCTGG ACT - TTAGTTATTG CCCCATAAAG7TAACAATCT/CA- 
682 GTG CA7G7GACACA - • -ACC CGTGGG7TC7GC CCAA7GGTGGACTAAGTG CAGGTAAGCAGCGACCTGAA 
742 GTCCATGTGACACA* - -A7ACCCTGG7TCTGGCCAAT/TGG<WATTTAG7G7AGGTAAGCT 

63 GTTTGTCTGAATTAT - - ACACACCAG7GGAAGAATA7GGTCTAATTTCCACGTCCCACTGGCATTGTG 
631 GTTTGTCTGAATTAT- - ACACAC CAGTGGAAGAATCTGG TCTAATC7GCACG CCT CA7GGG CATTGTG - - 
335 GCATGTGTGAAAAAAAGTAGCGCTTATTTAGACA^^ 

• ■ 
821 TCTTTCCTGATTGC^CAAGTGCACAAACTACA-ACCTGCAAAACAG CAC7CCGC77GTCACAGG7T 

"I ^S^^T^C^CTC - """S^-^^CW 

6S 5 ^TGrC^37GCC;.G7AAACGCAG7C7CTCTCTCCCCCCCCCCCCCCCCCCCC7C^GGAAiAGT^CAACGO 

831* CCTrTMC-TCTCCAACTCTCTCCGCCCCC^^ 

600 .CCTTTGGC-TCTCCCAGTGTCTCCGCCTCCACATGra^ 

74 9 ACAnCC7CAACCCT7AAGACACTGG7GG-TAGAGA7GCGGACCAGG CTA7TC77GTCG7-GC7A 

809 ACACTCC7CAACGC7TGAGACACT6G7GGGTAGAGA7GCGGGCCAGGA- - GGC7AT/TCTTG7CG7-GC7A 

12 9 -70777 G7»3GX3G«KK«KKKrrGCA(^CA77777AG7GCCA 77C777G* iGATT^C-CCCT 

697 - TG TTTT- - C<<<<yXGGGGGGGG$G TGCACACATTTTT^GTGCGAAaGTTTGTTTG C7G G i i C C- CCC7 
4 05 G7GCACGACCA7GAGTA7GCAAC77CACGAGACGTCG77AGGA A7CCACAG AA7G AT AG CAGG AA 
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8 6 S G7C7CCTC7CAACCAAaJ%AAAXX7A^GA77AAAC777 4 * A7C7C7G 955 

919 AGGAAGGATAACGG AT AG AAAG CGCAATG CGAGGAAAA7 - - 777G AACG CGCA^GAAAXCCAAXATCCGO 93 6 

764 GGGAAGGA7AACGGA7AGCAAG7G0AA7GCGAGGAAAA7- -TTTGAATGCGCAAGGAJ^GCAA^TCCGO 831 

898 ^CACCCATTAGXGGAATGGCCCAAAGTTA^ >" 

6ffS AACACCCATG AGGGGAATGGG • CAAAGTrAAACACTTT7CK3T77CAATGA7TCCTA7TTGC7AC7 - 729 

612 CCCGCCGCATGGA • AAATCAAC7GCGGCAAGAA* -7AAA7TTATCCG7AGAA7CCACAGAGCG G 872 

876 CCCG -7GCACGGA-AAA7CGA77GAGCGAAGAA* - CAAA7TTATCCGTGAAATCCACAGAGCG 0 935 

188 CCCCCCTATCA7---TCAT7CCCACAGCATTAG«7tTmCCTC;.C7CGAATTCGC7GTCC -. 244 

763 cccccctcccccc7atcatccccac;jx;at7^^ • "* 

470 GCTTACTACGTGAGAGATTCTGCTTACAGGATG - -77 CTCT7C77G77GAT7CCAT7AGG7GGn* i AT CAT 5 3 7 
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o*6 A- • AACACTTCCCTTTCTCTAATCTCTGCCAAA - CTCAAACTGCAAAACTAACCACAGAATG AT 1016 

9 87 GCTACCACCTTTTCACCCACCCAACACACTCCTATTTCTCCTCAATCACTCAACATAOAAAAAA* * * " * *°S0 

B32 GCTATCACCTTTXGAGCCAGGGCACACACTCCT.mCTC^ * 00 

9 SB CTTCTCTTGTTTTGTGCTTIti AATTGC 1034 
730 * - -CTC UU1 U lb iti l I rTGATTTCCACCATGTCAAATAMCGACAATTATATATACCTTTTCCTC- - * 793 

973 A- .TAMTTTGCCCACCTCCATCATCAACCACG'CCCCCACTAACTACATCACTCCCCTAi 1 1 i - 932 

936 A--TAAATTTGTCACATTGCTCCGTTCCCCAC CCACAGCATTCTC 978 

345 ACCTGTCAACCCCCCCCCCCCCCCCCC- CCACTGCC- * CTACCCTGCCCTGC 292 

B U ACCTGTCAACCCCCTCAC TGCCCTGCCCTCC 852 

IU CTCCGGTGGTGACAACTTGACACAAGCAGTTCCGAGAACCACCCACAACAATCACCATrCCAGC 601 
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in , 7 i^CCCTCACAATTATATAAACTCACCCACATTTCCACAGACCGTAATTTCATGTCTCA 

"J; CACCAAGACGCAATGAAACCCACATGGACA7TTACACCTCCCCACATGTGATAGTTTGTCTTAAC 

«ni AACTCCACCJ^CACAA7GAA7CGCACA7CCACA7~^^ 

Si! TGTCWCAATGTCTCTTTTTGCrGCCAT— GCTTrrTGCTTmCCTrrrGCACTCTCTCCCACTCCC 

oil C"CTCTCTCTCTTTGTCTTACTCCG CTCCCG7T7CCTTAC CCA CAG AT ACACACCCACT- GCAAACAG CA 

In I T7TTCTCTCTCTriOTCTTAC7CCGC7CCTG777CCT^ 

III rCTCCACGTCCrGTGTTTTGTGCTGTGTCTTTCCCACGCTATAAM 

854 cTOCACGCCCTCTGTTTTa«CTCTGGCACTCCCA« 

602 TATCACTTCTACATGTCAACCTACGATOTATCTCATCACCATCTAGTTTCTrGGCAAT 
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rrr c i ,li i TTACT7AG7CAGGTTTGATAACTTCCrrrrrTAT7ACCCTATCTT 

AGA AAAGTATAATAAGAACCCATGCCGTCCCTrTTCTTTCGCCGCTTCAAC TTTl n * * * 1 * IA 

AAAGCAAAAAAAG7ATAATAAGGACCCATGCCTTCCCTCT7CC7GGGCCGT^ 

ACAA AGAAAAAAAAACTACACTATGTCGTCTTCTCCATCGTTT 

a CAA7CAGT GCAG CAACACACAAAGAAG AAAAA7 AAAAAAACC7A 

GCA- - ACAATTAT AAAG ATACG C C AGG CCCACCT7CT i' 'i C i I 7TTCT7CA CI I IITT GACTGC-A 

ACG - - C7AG CC CAG C7GTC7TT CT 7T77C7TCA C ill* il IGGTGTGTTGCTTTTTTGGCTGC'-T 

TCCACCCAGCCAAAAAAACAGTCTAAAAAATTTGGTTGATCCTT^ 

TCCTCACAG CCAAAAAAA- --- AATTTG^CTGATCCTTTTCWCTGCAAGGTTTTTC^CC^C-C 

A7GGG7CAACA7CCAA7ACAAC7C CACCAA - • 7GAAGAAG AAAAACGG AAAG CAGAA7ACCAG i GACA 
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A7TTA7ACCAACCAACC- - AACCAJGSCC^^ 

TC7T ACACACA7CACGACCA-7CAC7GrACACCATA77AXCCCCACAgAC?tCACCAA 

TTGTCTATCAACACACACACACCTCACCAC^^ 

CCCC AA(^GG7TCICGC7ACC^CTAG7CC77ACA7CCAG7AC^C77GACA-ACTACACCA0 

CCSC AGGAGG77C7CGCTACCACXAC7CC77ACA7CGAGTAC7TrCTTGACA-AC ; iACACCAa 

A C7T7C7 ACAA7CCACCACAG CCAC CACCACAG CCG C7A7GA7 T GAACAAC7CC7A GAA 7A* ± 

AC7T7C7ACAACC ACCACCACCACCACCACCA7GA77GAACAAATCC7AGAA7AT7 

ACCAC77CCACCA - -CC7CAAC7A77CGAACAA- «AAGA70C7CGA7CA0A7C7TACAT7ACT 

ACCACCACCACCA- - CC7CAAC7A77CAAACAA- »AGGATGC7CGACCAGA7C77CCA7?ACr 

G7G7C AGT7CC7GACCA77GC7AA7C7A-7GGC7A7A7C7AG777CC7A7CC7GGGA7G 

' * • 
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1224 A7CG7ACAC7GTGA77AC7CCAGCAG7A77AG7C77CC77A7C7CCACAAACA7CAAGAA 

llll ATGG7AM7GATAG7ACCAC7CGC7T*CA773CX~A7A 

All AXGGTACG7CA7AC7ACCAC7CGC777CA77GC77A7AGCG7CC7CGAC7AC«T^ 

W09 A7GCTrAC7ACT7CA7ACC777GGrGC77C777CG77GAAC777A7AAGT^ 

S96 AXC^AC7ACT7CA7CCC777GG7CC77C777CC77GAAC77CA7CACC77GC7CCA 

1128 --eeTATGTCGTXeTCCttC^ 

1X$7 - -GGTA7A77G77C7GCCTG7G77C*ACAICA7CAAACAACTCAX^ 

4 89 - .CG7ACA7TG7C77GCCA77G77 CGCCA77A7CAAC W0A7CG7CGC7CATGTCAGGACCAA w ATT*Q 

1042 --(WACATSGICITGCCAITGTTSCTCA^ 

791 -rcA7C7C7C7CC7C77CArr7GCGT77G7G77TA777CGGG7A7.C^7AT4G**AiACAA^*AW^*u 
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162 8 TCTTCACACCACACTTTCCTACACACCACATTCCACACCTTAAACCCTTCCAACCACACATCCAAATCAT 1657 
trCTTCACACCACAGTTICCCACACAACACATTTCCCACCTCAACTTGTTCCACCCACACCTTCACO.wTT 1715 



1520 X G 7 1 CAC AC CACACTTTCCCACU CAACACATTTCCCACCTCAACTTCTTCff ACCCACA CATCCAC CTCTT 1385 
1628 7GTTCAGACCACAGXTTGCTAGAGAACAC0TTTCTCACGTCAAOTTGTTCOACCCACACGTICAGGTCTT 1657 
1415 TGTTCACACCACAGTTTCCTACACAACACC7TTCCCACCrCAAGITG7TCCAACCACACGTTCAGCTCTT 14 84 
1536 TGTTGAGACCACAGTTlGCCACAGAACAAGTTCCTCATGTCACGTCGTrGCAACCACACTTCCAGTTGTX 1605 
157S TGTTAACACCACAGTTTGCCACAflAACAAGTrCCTCATGTCACGTCGTTGCAACCACACTTCCAGTTCTT 1644 
BIS TG77GAGACCACACXT7CCCAGACAACAAGX7CCXCAXGXGACGTCGX7GGAACCACACXXCCAGXXGTT 554 
mI* TC TrCACACCACAGTTTCCCACAGACCAACT7CCTCATGTCACGTCGTTGGAACCACATTTCCACTTCTT 1507 
1206 TG7TGCG7CCGCJ^TlTCCCJUaCA?CCCC?T-«CATAICCTGCAICTACAACCCCATTTTCT0TTCCT 127S 
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CGCrAAGCACATCAAOTCAACCAGCCAAACACTTTCGATATCCAAGAATTGrrCTTTAGATTTACCCTC 
CTTCAAACACGTCAGAAAGGCACAGGGCAAGACTTTTGACATCCAGG 

CTlCAAGCACGTCACmCGCACAGCGCJ^GACITTIGACATCCAAGAATTCTTTTTCACATICACCGXC 
C7 XCAAC CAC G7X AGAAACCAC C GCG G7CAAACG77CCACAXCCAAGAAXXGXXCX7 CAGC7XCACCGTC 
C77CAAGCAC 077AGAAAACACC GCG G7CAGAC7 77XCACAXCCAAGAAXIGXXC7XCAGA7X GACCG7C 
CAAGAACCA7AT-CT7AAGCAC^GGG7GAA7AC7TTGA7AXCCACGAATTGT7CTX7AGATrTACCCrr 
CAAGAAGCA7A7CCTXAAACACAAGCG7GAG7AC77XGn7A7CCAGGAAXXGXXC7XIAGAXIXACT 
CAACAAGCA7A7CC7TAAAttCAACGG7GAG7ACT7TGA7ATCCUGCUA7TGX7CT7rAffAT*TAC7GTC 
CAACXXCCA7A77C77AACCACAAGGG7GAA7AC77rGATA7CCAGGAATXGT7C7TTAGAT7TACCC« 
TCCCAACCACA77CA7CGCCACAA7CGAGACXAC7TCCACAXCCAGCAGCTCTACT7CCCGTIC*CCA*0 

. - *. A A AAA** * * * * * * * * " * * 
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GACACCGC7ACT(yiGT7C7TG777GG7GAA7CCG77CAC7CC7rGTACGATCAAAAA77CCCCA*CCCAA 
CAC7CCCCC^CCGAGT7TTTGTT7CG7GAA7CCG77GAGTCCXrGAGAGA7CAA7CTATCGGCA*GTCCA 
GAC7CCCCCAC7WGWXXTGXT7GG7CAA7CCG77GAGXCC7TaAGAGA7CAAXCTA*iGGGA7G7CCA 
GAC7CCGCCACCGAGXTCXTGXX7CGXCAG7WCC70AAXCCXTC»GCGACCAAXCXAX7GG^GACCC 
GAC7CCGCCACCCAG7ICT7GXX7CGXGAGXC7CC7GAAXCCTXGAGACACGACXC7GXTCGXTXCACCC 
GA77CCCCCUCGGAG7ICX7AX77GG7CA57CCG7GCACTCCTTAAAG0ACGAAXCTAXXGG*AXCAACC 
GAC7CCCCCACCCAGXXCXTAXX7GGXCAC7CCC7GCACXCCT7AAACCACGAAACXAXCCW^ 
GAC7CGCCCACCGAG7XCX7AXI7GG7GAG7CCG7GCACXCCXTAAAGCACCAGGAAAXXCGC*ACGA» 
CA77CAGC3ACCGAGXTCX7A777GG7CAG7CCG7GCAC7CCXXAAGGGACGAGCAAA7XCGC*ACGA*A 
CA7C7GCCCACGCGCX77I7G777GGCGAG7C7G7GGGGXCGXXGAAAGACCAAGA7GCCAGG- - 

- • - — - - - ** #• * 



# » * 



C7CCAAACGAAA- - -XCCCAGGJUGAGAAAAC777CCCCC7CCmCAACGrTXCCCAACACXAC^GCC 
7CAA7GCGC77GACXXXGACGGCAAGGC7GCC777GC7GA7GCMXXAACXAXXCCCAGAAwA*^G*w 

SCAATeCACTTCACttrtfACCC^ 

C^CCACCAASSAXXXCGA7GGCAGAACAGA777CGCrGACGCXTXCAAC*A*^CCA^ 
CAACCACCAA5SA7XXCCM5GCAGAGCAGA777CCC7GACCCXXXCAACrACXCGCAGAC**ATO 
;UGAC6A7A7AGA7XX7CC7GG7ACAAAGCAC777CC7CAG7CGXXCAACAAACCCCAGG^ 
AAGACGA7A7AGA77XXGC7GG7AGAAAGGAC777CCrCAG7CGT?CAACAAAGCCCA 

CGAAAGACA7GT---C7GAAGAAAGACCCACA777GCCGACGCCXXCA^^ 

CCAACGACA7GG- . -C7GAAG;JUGACCOJU777CCCGACGCG77C^AC^AGXCC<aAGXC*A* * * G*C 
7?CC7GGAAGCA77CAA7G\G7CGCACAAG7/v* * i QQC 
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CACCAG;UG77AC7CCCACAC77777AC77777GACCAACCCXAAGGAAT7^ 

77CGAGAGCGG77A7CCAACAA77G7AC7CCC7G77CAACCGCAAAAAG77rAACGAG i GC^C^W 
77CGAGAGCG577A7GCAACPA7rG7AC7GGG7G77GAACGGG^^AAAG7XrAAGGAG A GCAACGCTAAA 
C7ACAGA77X77G77CCAACAAA7G7AC7GGA7C77GAA7GGCTCGGAAXXCAGAAAG7CCA^ 
C7ACACA77777G7XCCAA(^7GXAC7CGA7777GAA7GGCCCCGJAXXCAG^G7C^ 

TA77AGAACC7XGGXCCAGACC7XC7AC7CC77CG7CAACAACAACC 

XA77AGAA7777GG7GCAGACCT7C7AC7GGrrGA7C^CAACAACGAGXXXAGAGACTGAAC^GCTa 

CACCWAGX7CCX77ACACAAC77G7AC-CG77CG7CAACAACAAAGAGX7CAAG 
CACCACAG77CC777ACA(^CA77G7AC7GG77GG7CAACAACAAAGAGXXCAAGGAG7GCAAC 
AAC7AGGC<yL\CG77GCACGAG77G7AC7X7C7TXGXGACGGGTXXAGCX7rCCCCAG7ACAACAAGG * * 
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973 GXCCACCACTTGGCCAAGTACTTTGTCAACAACCCCTTCA^ 
997 CTGCACAACmCCTCACTACTACCTCAACA^ 
87 0 GTCCACAAGTTrGCTCACTATTACGTCAGCAAGCCTTTCCACTT^ 
97ft GTGWAACTTTCCTCACCACTAW^ 

765 GTCaCAAGTTTCCTCACCACTATGTCCXAAACCCTTTCCAGTIGACCCACCATGACt^ 
fi I fi GICCACAAC^CACCAACTACTATGTTCAGAAAGCTTTCCATGCTACCCCXGAACACCW 

!" "gS«TTACCAAC7ACTATCT^ 

£17 GTCCGAAAGT TCTCCACCCAGTG7GTC CA CAAC G CGTTACATCTTCCACCGCAXCACACC A 

#••*** * * ** **••••* ** • 

04 5 CCAAGTCCCCTTACGTTTTCTTGTACGAATTCGTTAAC 

l\l xtggXT^- - - • •XrGXGXT77TG7ACGAA77GG7CAAGCAAACCfcG^ 2130 

HI JISSS ATCTGrrCTTGTACCACTTCGTCAAGCAAACCACACACACCCAAGTGTT^ 2003 

l\l ATGTGTTC7TGTACGA G 77GCC7AA GCAAACCACAGAC CCAAACC7 C 7 7GACAGACCA 2111 

!" J r GT C TTCTTG7ACCAC77GGCTAACCAAAC7AWGACCCAAACC« «H 

"! iV«WrCTIWACGWCWCT^ 2019 

S GCGGGt XtG7GT7CTX CT A7GACC77G7CAAGCACACCAGAGACCCCMG«^ MM 

noi GCCGCT ATGTGTTCTTC7ATCACCr7G7CAAGCACACGACAGACCCCAAGGTOTTCCWCACCX 13 6S 

;S CCCGGT ATG7G77C7T GTACCACCX 7GCCAAG CAGACCAAACACCCCAATG7 G77GCG7CACCA 1918 

"! GCoIcT ACGXGTT7«CCCCCAG77CG7CJUaCACAC7CGACA7CCCGT7G7XXTACAAGACCX 1541 

# *•••••*••• •* ** 

UJ GmTTCWKATTTTCWCCCCGTACACACACCACCCCCCCTra^ 

mo c^TCAtSt^irc«ccoAACA«cAccACT C ctoc(^crr<nccTi7C««^^a 
o» wSScatcttcttcccacgaagagacac^^^ 
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3329 TAACGTWCCCCGTGTCAACTCAATTTGAC G--AGTAACTTCCTAAGCTCGAATTATGC 3385 

3247 TAACGTTGCACCATATCAACTCAATTTATC CTCATTCATCTCATAAAACAAGAGCCAAA 3505 

3325 CATTGTCTTATTATTTCAGAGCAAACTAC ATCTTCAACATACTTCGGTATTTGAT 3379 
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3574 GGTGGGTTTATCCCGCA-ATGGATAACTCGATTGAGCA-TCCCTGGAGCAATCGCAAAAGATCTGCCtAG 

3517 CAACAAATCCTCGCGCGCAGTATTTCGACGAAAC--CACAACAAA7AAAAAAAACAAATTCTACACC«CT 

5434 ACTTTAGTTATTGCCC-CATAAAGTTAACAATCT--CACCTTTGGCTCTCCCAGTGTCTCCGCCTCCAGA 

3508 CAAGCGACATCAACAGTCTAGCATTCATCAATTG--CA7CAACTACTCGA 



3307 CAAGCGACATCAACAG7CTAGCATTTA7CAAC7G - -CAT CAACTACTCGAGGGGTC AACT ATTCTCCGC A 

3458 A7CAAAGTGACC7CCCAGA7CAAGA7C77GA7TGACAAG77CAAGG7G7ACTTGT 7TG«G7iGCC*G 

34 68 XTCAAAGTCACC7CCCA(:ATCAAGATCTTGATTGACAAGTTCAAGGTGTACnGT---i i G^GTTGCCCG 
2737 TCCCCATA6CCTAGAAGCACCAGC3&GA7GATGGAGCAACTC^ 

3217 CTCCCA7AGCCTAGAACCATOIAA;AGA7GA47GAGCAACTCC7CCAG7AC7GGTACAT7GCaC^ 
3035 G7TACAACGACTGAGGC7GI7GGCCG7G7GACCAA77GG777C7TTGGTGACCTAGAT7GC7CCCGCAGG 
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TG TAT7AAACTACA7ACAGAAA7AAAAACGTG7C7XGA77CAT7GGT7I GGT7CTTGT7GGG7^ 

T C77T77C77CAC WG7CAACAAAAAACAACAAA77A7ACACCATT7CAACGAT7TT1 GC i Cu^i 

7G C7CG777T ACACCCTCGAG C7AACGACAAC AC AAC ACCC A7GAGGG GAATGGGCAAAG 7 i " " " 




GA^CCMGGCC^CTAGGGGGGCA^ 
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CCCAGCCAATATTTCACATCATCTCCTAAXTTCTCCAACAATCCCA^CCTACCCTACTCCAGCACCCCCT 
AAATCCTATATAATGCTTrAATTCAACTCACGTATCTTTAT*TTTACTCTrTTCA6CTCAACTATC T --T 
AAACACTTTTCCTTTCAATGATTCCTAmCCTACTCTCTTGTTTTGTC^TTTGAT^ - 0 

CACAA CTACAACAACGT ATTCG CATTG ATACTGAAGAACATCAGCG ATG AAG ACATCTTG ATCATAC* - A 
GACAACTACAAGAAAGT A7TC G CA77GAT ACTG AAG AACAT CAG TC ATG AAGAT AT CTTG AT CGT AC - - O 
AG AAGTTGACCACGGG CTTGATCAACTTGGCCTTCCAGAACAACAAGCAGCACTT<«GACGAGGT O 

AG AAGTTG ACCACGGGCTTC ATCAACTTGCCGTTCCAGAACAACAACCAGCACTre u 
«GCGCC»TTCACGCACACCttGTACGACGGCTC^^ 

r GGCGCCGTTCACGCACACCCAGTACGACGGATGGTATGGGTTCAAGTTrCGGCGGGAGTTTCT - G 
TGTCGTTGCGGGGAAATTCCCGC^TTrTTGTGTAACGAAAOT 

CTCAaATCTTXTT7»TA7C«CTTC7C»C«CCC0TCOAATC- -CCOTTCAGACCATTOTTACCTCTA 
S^IwC«CAATTATATATACCTTT. - - 7CGTCTGTCCTO - - "CAATCTCT- CTTTTTGCTGCCATT 
SIcTCCttTCGACACTACAATTGTTIA^^ 

»CATCTTCAACO*OTTCVTCCXCAAGTTCmOOC««CC«0- -CCGCAATTGAC 

aStCTTCAACGAGTTCATCGACAAGTTCTTTGGCAACACAGAG- -CCGCAATTGAC 

GCGAAGAAGATCGGGCGGCAGACGGACTTOGTGCATCCGCOGTZ- -««CWO" - " _ * 
GCGAAGAAGA77GGAAGGCAGACGGACTTCGTGCAT0CGaGTT--CC0TGGAG0GGO"^ 

„CMC«3AAC<»7CCACCCCttTTrCC<mOC^^ 

«CTOTWeeien«Tcn«»M»e™ 

GTCGAATAGACGGTT7GTTIACTCATTAGATGGTCCCAGATTACTTTTCAAGCCW 

CTTCTACAAGTACATCACTICAACAGT- - GTCACGAGACTACAACTCCAACATCGGCTCCACAGCCAAnG 
OTTCTACAAGTACATCACCTCAACAGT- - GTCGCAMACTACAACTCCAACATCGGAGCCAttGCCAAAS 
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TCAAGA7CT7CT7GAAC77GGAC7CGTA7G7CCAC ^,^*rr-«r*fll5>eTaCTC 

7G^CACC^G77CGA7GAC77C7CGC7CGG7GGCAC< ; A7CAGG77CT7GAAGCCG 

7GGCGACGCAG77CGA7GAC7777£G 
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1 MATQE I IDS VLPYL TKWYTVITAAVLVFLI STNIKNYV 3 8 

1 MTVHDIIATY FTKWYVIVPLALIAYRVLDYFYGRY 35 

1 MTAQDI I ATY ITKWYVIVPLALIAYRVLDYFYGRY 3 5 

1 MS S S P S FAQEVLATTS P Y I EYFLDNYTRWYYF I PLVLLS LNFI S LLHTRY 50 

1 MS S S P S FAQEVLATTS PY I EY FLDNYTRWYYFI PLVLLSLNFISLLHTKY SO 
! MIEQLLEY WYVVVPVLYIIKQLLAYTKTRV 30 

2 MIEQILEY WYIWPVLYIIKQLIAYSKTRV 30 

x MLDQILHY WYIVLPLLAIINQlVAHVRTtfY 30 

1 HLDQXTm WIVLPLLVIIKQIVAHARTNY 30 

1 MAISSLLSWD VICWFICVCVYFGYEYCYTKY 32 

3 9 XAKKLKCVDPP YLXDAGLTGI LS LI AAIXAKNDGRLANFAD EVFDE Y 85 
3 6 LMYXLGAKPFr QKQTDGCFGFXAPLELLXKKSDGTIiIDFTL- - -QRlEDli 82 
2 6 lWKMAKPFFQKQTDGYFGFKAPLELLKKKSDGTIiIDFTL---ERlQAIi 82 
51 lERRFHAKPWNFVRDPTFGIATPLLLIYLKSKGTVi^FAWGLWNNjCYIV 100 

51 LERRFKAKPWNVVZiDPTFGIATPLXM^ 100 

31 iJHKKLGAAPVTNXLYDNAFGIVNGWKAI^FKXEGRAQEYin)- - - YXFDHS 77 

31 iJ-lXQLGAAPITNQLYD^FGtVNGWKALQFKXEGRAQEYOT- - ~HXFDSS 77 

31 IjMXKLGAKPFTHVQRDGWLGFXFGREFLXAXSAGRLVDLI I - - - SRFKDN 77 

31 LMKKlrG AKP FTKVQ LDG W FG FXF.G RE FLXAXS AG RQVDLI I - - -SRFKDN 77 

33 IJ^KKGAREIEWINDGFFGFRLPLLU-IRASNEGRLISFSV- - -XRFSSA 79 

* * * 

8 6 pn--ktfylsvagalxivhtvz>?h:ni^^ 133 
8 3 drpdiptrtfpvrsinxvntlepenikaiiatqfndfslgtrhshfaplii 132 

83 NRPDIPTFTFPIrSINXrlSTLEPr^IKAILiATQFNDFSLGTRKSHFAPLIj 132 
101 RDPKYX7TGLRIVGL?LIETt<DPiNIKAVIATQFNDFSLGT?J2}FLYSLIi ISO 
101 KDPZ<YKTTGLRIVGLPLI3TIDPENIKAVIATQFOTFSLGTHCTFXjYSLL ISO 
78 XNPSVGTYVSILFGTRIVVTKDPSNIKAIIJ^ 127 
73 XNPSVGTYVSXLFGTKIVVTXDPENIXAI^ 127 

7 8 2D TFS S YAFGNI-T/-VFTRDPENIXALLATQFGDFS LGS RVXFFXPLIa 123 

78 c D TFS S YAFGNKVVFTRDPSNI3CALLATQ? GDFSLG S RVXF r XPLli 123 

8 0 PXPQNXTLWRALSVPVILTXDPVNIXAMLSTQFDDF 129 

* ★ ++** + ++♦/ ***** * ** 
• • • • • 

134 GDGIFTLDGEGKXHSRAMLRPQr ARDQIG!^<A^ 183 
133 GDGI FTLDGAG WXHS RSMLRPQ FAREQ IS KVKLLE PXVQVFFXHVRXAQG 182 
133 GDGI FTLDGAGWXHS RSMLRFQFAREQI SHVXLLS PKMQVFFXKVRXAQG 182 

i5i gdgiftldgagwxhsrtmlrpqf;lr£qvskvxllsphvqvffkhv^^ 200 
151 gdgiftldgagwxksrtmlrpqfarsqvskvx^^ 200 
128 gdgi ftldg egwxhs ramlrpqfareqvahvts le phfqllxxh ilxhxg 177 
128 gdgiftldgegwkksrs^rpqfareqvakvtslephfqllkxhilxhxo 177 
124 gygiftldaegwxhsramlrpqfareqvakvtslephfqllkxhilkhxg 173 

124 G YGI FTLDG EG WKH S RAMLRFQ ? ARSQVAKVTS LS PHFQLLKXHI LXHXG 173 
130 G KG I FTLDG PE WXQS RSMLR PQFAXDRV5 HILDLSPKFVLLRXH I DGKNG 179 
* ****** ♦+.*+,**++*+*.... +. **** • * 
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! MATQEI IDSVLPYL TKWYTV ITAAVLVFLI STNI KNYV 38 

1 MTVHDI I ATY FTKWYVIVPLALI AYRVLDYFYGRY 3 S 

1 MTAQDIIATY ITKWYVIVPLALIAYRVLDYFYGRY 35 

1 MSSSPSFAQEVLATTSPYIEYFLDWYTRVnfYFIPLVLLSLNFISIiLHTRY 50 

1 MSSSPSFAQEVrATTSPYIEYFLDNYTRWYYFIPLVI/LSLNFISLLHTKY 50 

x MIEQLLEY WYVyVPVLYIIKQLLAYTKTRV 30 

- MIEQILEY WYIWPVLYI IKQLIAYS KTRV 30 

. MLDQILHY WYIVLPLLAI INQI VAHVRTNY 30 

- MLDQIFHY WYIVLPLLVX IKQIVAHARTNY 30 

1 MAISSLLSWD VICWFICVCVYFGYEYCYTKY 32 

39 KAKKLKCVDPPYXKDAGLTGILSLIAAIKAKNDGRLANFAD EVFDEY 85 
3S LMYKLGAXPFFQKQTDGCFGFKAPLELLXXKSDGTLIDFTL QRIHDL 82 
3 6 LMYKLGAKPFFQKQTDGYTGFKAPLELLKKXSDGTLIDFTL— -ERIQAI- 82 
51 LERRFHAKPLGNFVRDPTFGXATPLLLIYLKSKGTVi-IXFAWGLWNNXYlV 100 
51 IjERRFHAXPIjGNVVLDPTFOIATPLILIYLICSKGTVI'IXFAWSFWNHKYIV 100 
3 1 LMKKLGAAPVTNKLYDNAFGIVNGWKAWFKKEGRAQEYOT- - - YKFDHS 77 
31 LMXQI^AAPITNQLYDNVFGIVNGWKAIjQFKXEGRAQSYKD- - -H3CFDSS 77 
31 LMKKLG AKP FTHVQRDGWLG FKFGREFLXAXSAGRLVDLI I - - - SRFKDN 77 
31 LMKXIX5AXPFTHVQLDGWFGFKFGREFLKAKSAGRQVPl.il- - -SRFHDtf 77 
33 LHHXKGAREIENVIKI)GFFGFRLPLLLl*!RASNEGRIiISFSV- - -KRFESA 79 

85 PN- - HTFYLS VAGALXI Vr^TVDPENIKAVZATQFTOFS WTRHAHFASLIi 133. 

83 DRPDIPTFTFPVFSI^VOTLEPENIKAILATQFlIDFSLGTRHSHrAPLL 132 

83 NRPDIPTFTFPIFSINLISTLEPENIXAILATQFNDFSLGTRHSHFAPLIj 132 
101 RDPKYXTTGLRIVGLPLISTMDP2NIXAVIATQFm)?SLG7R3)FLYSIi ISO 
101 KDPKYKTTGLRlVGLPLIETIDPZNIKAVIiATQFNDFSLGTRKDFLYSLIi 150 

78 XNPSVGTYVSILFGTRIVVTXDPENIKAIIATQFGDFSIXSXRHTLFXPIili 127 

73 KNPSVGTYVSILFGTXIVVTXDPEtfl^ 127 

78 £ D 7r S S YAFG^iVVFTRDPE^IXALliATQFGDFS LGS RVKFFKPLIj 123 

78 £ 0 TFSSYAFGNHVVFTRJDPENIXAliLATQFGnFSLGSRVKFFXPLIj 123 

80 PHPQNXTLVNRALSVPVILTXDPVNIKAMLSTQFDDrSLGLRLKQFAPLL. 129 

* * **** * ***/ ***** * ** 
• • • • • 

134 GDGIFTLDGEGWXHSSlAMLRPQ?ARI>QIG5iV3CALEPHlQIMA3CQIK^QQ 183 

133 G DG I F T LDG AG WXH S R S MLRP Q F AH EQ I S KVXL LE PKVQVFFXHVRXAQG 182 

133 GDGI FTLTXSAGWKKSRShUjRPQFAREQI SSVXLLEP34QVFFX3V*l^QG 182 

151 GDG I FTLDG AG WXH S RTMLRP Q FAREQVS KVXLLE PHVQVFFXHVRXHRG 200 

151 GIX3lFTLTCAGWXKSRTMLRPQFARSQVSKVXLLEPHVQVFFXHVR3GiRG 200 

128 GDGIFTLDGEGVT<HSRAMLRPQFAREQVAKVTSLEPHFQ1*LKXHILXHXG 177 

128 GDGI FTLDGEG WXK S R S MLR PQ FARE QVAKVTS LE PHFQ LLXXHILXHXG 177 

124 GYGlFTLDAEGWXHSRAMLRPQrAREQVAKVTSLEPHFQLLKXHILXHXG 173 

124 G YG I FTLDGEG WXH S RAMLR PQ FARSQVAKVTSLSPHFQLLXXHILXHXG 173 

130 GKGI FTLDGPEWXQSRSMLRPQr AXDRVSH ILDLEPHFVLLRXHIDGHNG 179 
* ****** ♦*.**.*****+*,... *. ***+ * 
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383 TLRMYPSVPVNFRTATRDTTtPRGGGANGTDPIYIPXGSTVAYWYKTHR 432 
381 TLRLYPSVPQNFRVATKNTTLPRGGGXDGLSPVLVRKGQTVIYGVYAAHR 430 
381 TLRLYPSVPQNFRVATKKTTLPRGGGK3X5LSPVLVRKGQTVMYGVYAAHR 430 
399 TLRLYP SV PHNFRVATRNTTLPRGGGEDGYS P I WKKGQWMYTVI ATHR 448 
399 AI^YPSVPHNFRVATROTTLPRGGGKDGCSPIVVKKGQVVMYTVIGTHR 448 
376 TLRIYPSVPRNFRIATKNTTLPRGGGSDGTSPILIQXGEAVSYGXMSTHL 425 
376 TLRVYPSVPRNFRIATKNTTLPRGGGPDGTQPILIQKGEGVSYGINSTHL 425 
371 TLR LHPS VPRNARFA I XDTT&PRGGG PNGKDP I til RXDEWQY? ISATQT 420 
371 TLRtHPSVPRNARFAIKDTTLPRGGGPNGKDPILIRKNEVVOYSlSATQT 420 
358 TLRlrYPSVPRNARFATRNTTLPRGGGPDGSFPILIRKGQPVGYFICATHL 407 
" ** **#* * ★ * . ******** * *. . * * * • 

433 LEEYYGKDAITOFRPERWFEPSTKKLGWAYVPrNGGPRVCLGQQFALTEAS 482 

431 NPAVYG XDALEFRP ERWFE PETKXLGWAFLPFNGGPR ICLGQQFALTEAS 480 
431 NPAVYGKDALEFRPERWFEPSTKKLGWAFLPFNGGPRICLGQQFALTEAS 480 
449 DPSIYGADADVFRPERWFEPETRXLGWAYVPFNGGPRICLGQQFALTEAS 498 
449 DPSIYGADADVFRPERWFEPETRiCLGWAYVPFNGGPRICLGQQFALTEAS 498 

426 D P VYYG PDAAE FRPERWFEPS7XKLGWAYLP FNGGPR ICLGQQFAI/TEAG 475 
426 DPVYYGPDAA3FRPSRWFEPSTRKLGWAYLPFNGGPRICLGQQFALTEAG 475 
421 NPAYYGADAADFRPERWFEPSTRl^LGWAFLPFNGGPRlCIiGQQFALTEAG 470 

421 NF AYYGADAADFRP ERWFEP STRNLG WAYLPFKGGPR ICLGQQFAI/TSAG 470 
408 NEKVYGNDSHVFRP2RWA^ZGKSLGWSYL?3T;GGPRSCLGQQFAILEAS 457 

■ — ******* +*****+. ** 
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483* YVITRLAQMFETVS S DPGLSYPPPXCIKLTi'SSHNDGVpVKM 
481 YVTVRLLQEFAKLS HDPDTEYP PXXMSHLTM 5 LFDGAN I£>IY 
481 YVTVRl»t^3FGKI*STOPNT3YPPRJQ'ISHLraSLFDGANJE^?Y 

499 yvtvrlmsfahlsVidpdteyppxmntltlslfdgadvr;^ 
499 yvtvrllq3fgnls ldphasyppxlqotltl s lfdgadvrmf 

476 yvLVRLVQSFSHVRLDPDSVYPPXRLTOLTOCIiQDGAlVXFD 

476 YVLVRLVQSFS H I ?*LDP DEVYP PXRLTNLTtfCLQDG AIVKFD 

471 YVLVRLVQ3FPNLSQD?3TXYPPPRIJU^TMCLFDG?-^<MS 

471 YVI,VRLVQSFPSLSQDPSTEYPP??JJUiLTI-:CLFDGAYV:<r*!Q 

4 5 B YVLARLTQCYTTIQL-R - TTS YPPXXIiVHZiTi'IS LXiNGVYIRTRT 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Wilson, C. Ron 

Craft, David L. 
Eirich, Dudley 
Eshoo, Mark 
Madduri, Krishna M. 
Cornet t, Cathy A. 
Brenner, Alfred A. 
Tang, Maria 
Loper, John C. 
Glee 9 on, Martin 

(ii) TITLE OF INVENTION : CYTOCHROME P450 MONOOXYGENASE AND NADPH 
CYTOCHROME P450 OXIDOREDUCTASE GENES AND PROTEINS RELATED 
TO THE OMEGA HYDROXYLASE COMPLEX OF CANDIDA TROPICALIS AND 
METHODS RELATING THERETO 

(iii) NUMBER OF SEQUENCES: 107 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: HENKEL CORPORATION 

(B) STREET: 2500 Renaissance Boulevard, Suite 200 

(C) CITY: Gulph Mills 

(D) STATE: PA 

(E) COUNTRY: U.S.A. 

(F) ZIP: 19406 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
(A) NAME: Drach, John E. 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CCTTAATTAA ATGCACGAAG CGGAGATAAA AG 
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<2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
CCTTAATTAA GCATAAGCTT GCTCGAGTCT 30 



<2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCTTAATTAA ACGCAATGGG AACATGGAGT G 31 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
CCTTAATTAA TCGCACTACG GTTATTGGTA TCAG 34 



(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCTTAATTAA TCAAAGTACG TTCAGGCGG 29 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
CCTTAATTAA GGCAGACAAC AACTTGGCAA AGTC 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 
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(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGGTTTAAAC 10 



(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO:14: 
AGGCGCGCC 



(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCTTAATTAA 



(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 3.. 4 

(D) OTHER INFORMATION: /note- "y-dCTP or dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note- "v»dATP or dTTP" 
(ix) FEATURE: 

(A) NAME /KEY: misc_ieature 

(B) LOCATION: 15 . . 16 

(D) OTHER INFORMATION: /note- "v-dATP or dTTP" 
(ix) FEATURE: 

(A) NAME /KEY: mis c_f eature 

(B) LOCATION: 18 . .19 

(D) OTHER INFORMATION: /note- "w-dATP or dTTP" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
TCYCAAACWG GTACWGCWGA A 
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(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 12.-13 

(D) OTHER INFORMATION: /note- D y«dCTP or dTTP" 
(ix) FEATURE : 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /note- °v«dATP or dTTP" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 
GGTTTGGGTA AYTCWACTTA T 

(2) INFORMATION FOR SEQ ID NO: 18: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
CGTTATTATC ATTTCTTC 

(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
( ix) FEATURE : 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 3. .4 

(D) OTHER INFORMATION: /note- "nudATP or dCTP H 
(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note- T-dATP or dOTP n 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GCMACACCRGTA CCTGGACC 

(2) INFORMATION FOR SEQ ID NO: 20: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATCCCAATCG TAATCAGC 
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(2) INFORMATION FOR SEQ ID NO:21: 
(1) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 18 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ACTTGTCTTC GTTTAGCA 18 



(2) INFORMATION FOR SEQ ID NO: 22: 
it) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: other nucleic acid 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CTACGTCTGT GGTGATGC 18 

(2) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(ix) FEATURE: 

(A) NAME/KEY: misc^feature 

(B) LOCATION: 3.. 4 

(D) OTHER INFORMATION: /note- M n«dATP or dCTP or dGTP or dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 6.. 7 

(D) OTHER INFORMATION: /note- "Y»dCTP or dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note- n n«dATP or dCTP or dGTP or dTTP w 
(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /note- "n=dATP or dCTP or dGTP or 

dTTP" 

( ix) FEATURE : 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /note* "n=dATP or dCTP or dGTP or 

dTTP" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CGNGAYACNAC NGCNGG 17 

(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION : 3. .4 

(D) OTHER INFORMATION: /note- H r«dATP or dGTP" 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION; 6..7~ 

(D) OTHER INFORMATION: /note- "y^dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note- "n«=dATP or dCTP or dGTP or 

dTTP* 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /note* "n»dATP or dCTP or dGTP or 

dTTP" 

fix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 15.. 16 

(D) OTHER INFORMATION: /note- "n«dATP or dCTP or dGTP or 

dTTP" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
AGRGAYACNA CNGCNGG 17 



(2) INFORMATION FOR SEQ ID NO: 25: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 3.. 4 

(D) OTHER INFORMATION: /note- M n«dATP or dCTP or dGTP or 

dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: raisc_f eature 

(B) LOCATION: 6. .7 

(D) OTHER INFORMATION: /note- "r»dATP or dGTP" 
(ix) FEATURE: 

(A) NAME/KEY: mis cofeature 

(B) LOCATION: 9.. 10 

(D) OTHER INFORMATION: /note- "y-dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /note- "yadCTP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: mis c_f eature 

(B) LOCATION: 15.. 16 
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(D) OTHER INFORMATION : /note* "n^dATP or dCTP or dGTP or dTTP" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
AGNGCRAAYT GYTGNCC 17 

2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1..2 

(D) OTHER INFORMATION: /note- "y-dCTP or dTTP M 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 4.. 5 

(D) OTHER INFORMATION: /note- "n»dATP or dCTP or dGTP or dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 7.. 8 

(D) OTHER INFORMATION: /note* »r»dATP or dGTP" 

(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 10.. 11 

(D) OTHER INFORMATION: /note- tt y»dCTP or dTTP" 

(ix) FEATURE: 

(A) NAME/KEY: raisc_feature 

(B) LOCATION: 13.. 14 

(D) OTHER INFORMATION: /note- »y«dCTP or dTTP" 

(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 
<B) LOCATION: 16.. 17 

(D) OTHER INFORMATION: /note* "n-dATP or dCTP or dGTP or 

dTTP M 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO:26: 
YAANGCRAAY TGYTGNCC 18 

2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

ATTCAACGGT GGTCCAAGAA TCTGTTTGG 29 

(2) INFORMATION FOR SEQ ID NO: 28: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
GAGCTATGTT GAGACCACAG TTTGC 25 

(2) INFORMATION FOR SEQ ID NO:29; 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:29: 
CTTCAGTTAA AGCAAATTGT TTGGCC 26 

(2) INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CTCGGGAAGC GCGCCATTGT GTTGG 25 

2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
TAATACGACT CACTATAGGG CGAATTGGC 29 

(2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 3, .4 

(DJ OTHER INFORMATION: /note* "r-dATP or dGTP" 

(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 4..5~ 

(D) OTHER INFORMATION: /note- "y-dCTP or dTTP" 
(ix) FEATURE: 

(A) NAME /KEY: misc_f eature 

(B) LOCATION: 16.. 17 

(D) OTHER INFORMATION: /note- "y-dCTP or dTTP H 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
TGRYTCAAAC CATCTYTCTG G 21 
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(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GGACCGGCGT TAAAGGG 17 



(2) INFORMATION FOR SEQ ID NO:34: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 9.. 10 

(DJ OTHER INFORMATION: /note* "wodATP or dTTP" 
(ix) FEATURE: 

(A) NAME/KEY: misc_f eature 

(B) LOCATION: 12.. 13 

(D) OTHER INFORMATION: /note- "y-dCTP or dTTP" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CATAGTCGWA TYATGCTTAG ACC 23 

2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGACCACCAT TGAATGG 17 

(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 540 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

ATGATTGAAC AACTCCTAGA ATATTGGTAT GTCGTTGTGC CAGTGTTGTA CATCATCAAA 60 

CAACTCCTTG CATACACAAA GACTCGCGTC TTGATGAAAA AGTTGGGTGC TGCTCCAGTC 120 

ACAAACAAGT TGTACGACAA CGCTTTCGGT ATCGTCAATG GATGGAAGGC TCTCCAGTTC 180 

AAGAAAGAGG GCAGGGCTCA AGAGTACAAC GATTACAAGT TTGACCACTC CAAGAACCCA 240 

AGCGTGGGCA CCTACGTCAG TATTCTTTTC GGCACCAGGA TCGTCGTGAC CAAAGATCCA 300 

GAGAATATCA AAGCTATTTT GGCAACCCAG TTTGGTGATT TTTCTTTGGG CAAGAGGCAC 360 

ACTCTTTTTA AGCCTTTGTT AGGTGATGGG ATCTTCACAT TGGACGGCGA AGGCTGGAAG 420 

CACAGCAGAG CCATGTTGAG ACCACAGTTT GCCAGAGAAC AAGTTGCTCA TGTGACGTCG 480 

TTGGAACCAC ACTTCCAGTT GTTGAAGAAG CATATTCTTA AGCACAAGGG TGAATACTTT 540 
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2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
CCGATGAAGT TTTCGACGAG TACCC 

2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
AAGGCTTTAA CGTGTCCAAT CTGGTC 

(2) INFORMATION FOR SEQ ID NO: 39: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
ATTATCGCCA CATACTTCAC CAAATGG 

(2) INFORMATION FOR SEQ ID NO: 40: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
CGAGATCGTG GATACGCTGG AGTG 

(2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 
GCCACTCGGT AACTTTGTCA GGGAC 

(2) INFORMATION FOR SEQ ID NO: 42: 
' (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE; other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42; 
CATTGAACTG AGTAGCCAAA ACAGCC 

(2) INFORMATION FOR SEQ ID NO: 43: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
CCTACGTTTG GTATCGCTAC TCCGTTG 

(2) INFORMATION FOR SEQ ID NO: 44: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
TTTCCAGCCA GCACCGTCCA AG 

(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45 
GCAGAGCCGA TCTATGTTGC GTCC 

(2) INFORMATION FOR SEQ ID NO: 46: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
TCATTGAATG CTTCCAGGAA CCTCG 

2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47 
AAGAGGGCAG GGCTCAAGAG 

(2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS; 
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(A) LENGTH: 21 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
TCCATGTGAA GATCCCATCA C 

(2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
CTTGAAGGCC GTGTTGAACG 

(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CAGGATTTGT CTGAGTTGCC G 

(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE.: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
CCATTGCCTT GAGATACGCC ATTGGTAG 

(2) INFORMATION FOR SEQ ID NO: 52: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
AGCCTTGGTG TCGTTCTTTT CAACGG 

(2) INFORMATION FOR SEQ ID NO: 53: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: other nucleic acid 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 53 
TTGGGTTTGT TTGTTTCCTG TGTCCG 



(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
CCTTTGACCT TCAATCTGGC GTAGACG 

(2) INFORMATION FOR SEQ ID NO: 55: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55 
GTTTGCTGAA TACGCTGAAG GTGATG 

(2) INFORMATION FOR SEQ ID NO: 56: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56 
TGGAGCTGAA CAACTCTCTC GTCTCGG 

(2) INFORMATION FOR SEQ ID NO: 57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57 
TTCCTCAACA CGGACAGCGG 

2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
AGTCAACCAG GTGTGGAACT CGTC 



(2) INFORMATION FOR SEQ ID NO: 59: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; 49 base pairs 
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(B> TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GGATCCTAAT ACGACTCACT ATAGGGAGGA AGAGGGCAGG GCTCAAGAG 49 

(2) INFORMATION FOR SEQ ID NO: 60: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
TCCATGTGAA GATCCCATCA CGAGTGTGCC TCTTGCCCAA AG 42 

(2) INFORMATION FOR SEQ ID NO: 61: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GGATCCTAAT ACGACTCACT ATAGGGAGGC CGATGAAGTT TTCGACGAGT ACCC 54 

(2) INFORMATION FOR SEQ ID NO: 62: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 
AAGGCTTTAA CGTGTCCAAT CTGGTCAACA TAGCTCTGGA GTGCTTCCAA CC 52 

(2) INFORMATION FOR SEQ ID NO: 63: 
(i) SEQUENCE CHARACTERISTICS': 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
GGATCCTAAT ACGACTCACT ATAGGGAGGA TTATCGCCAC ATACTTCACC AAATGG 56 

(2) INFORMATION FOR SEQ ID NO: 64: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 64: 
CGAGATCGTG GATACGCTGG AGTGCGTCGC TCTTCTTCTT CAACAATTCA AG 52 
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(2) INFORMATION FOR SEQ ID NO: 65: 
(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION; SEQ ID NO:65: 
CATTGAACTG AGTAGCCAAA ACAGCCCATG GTTTCAATCA ATGGGAGGC 49 

(2) INFORMATION FOR SEQ ID NO:66: 
(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6€; 
GGATCCTAAT ACGACTCACT ATAGGGAGGG CCACTCGGTA ACTTTGTCAG GGAC 54 

(2) INFORMATION FOR SEQ ID NO: 67: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGATCCTAAT ACGACTCACT ATAGGGAGGC CTACGTTTGG TATCGCTACT CCGTTG 56 

2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
TTTCCAGCCA GCACCGTCCA AGCAACAAGG AGTACAAGAA ATCGTGTC 48 

(2) INFORMATION FOR SEQ ID NO: 69: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 
GGATCCTAAT ACGACTCACT ATAGGGAGGG CAGAGCCGAT CTATGTTGCG TCC 53 

(2) INFORMATION FOR SEQ ID NO: 70: 
(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
TCATTGAATG CTTCCAGGAA CCTCGCCACA TCCATCGAGA ACCGG 45 

(2) INFORMATION FOR SEQ ID NO: 71; 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
GGATCCTAAT ACGACTCACT ATAGGGAGGC TTGAAGGCCG TGTTGAACG 49 

(2) INFORMATION FOR SEQ ID NO: 72: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
CAGGATTTGT CTGAGTTGCC GCCTGATCAA GATAGGATCC TTGCCG 46 

(2) INFORMATION FOR SEQ ID NO:73: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 
GGATCCTAAT ACGACTCACT ATAGGGAGGG GTTTGCTGAA TACGCTGAAG GTGATG 56 

(2) INFORMATION FOR SEQ ID NO: 74: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 
TGGAGCTGAA CAACTCTCTC GTCTCGGGTG GTCGAATGGA CCCTTGGTCA AG 52 

(2) INFORMATION FOR SEQ ID NO: 75: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
GGATCCTAAT ACGACTCACT ATAGGGAGGT TCCTCAACAC GGACAGCGG 49 

(2) INFORMATION FOR SEQ ID NO: 76: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 76: 
AGTCAACCAG GTQTGGAACT CGTCGGTGGC AACAATGAAA AACACCAAG 4 9 

(2) INFORMATION FOR SEQ ID NO:77: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
GGATCCTAAT ACGACTCACT ATAGGGAGGC CATTGCCTTG AGATACGCCA TTGGTAG 57 

2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 
AGCCTTGGTG TCGTTCTTTT CAACGGAAGG TGGTCTCGAT GGTGTGTTCA ACC 53 

(2) INFORMATION FOR SEQ ID NO: 79: 
(i) SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 55 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:79: 
GGATCCTAAT ACGACTCACT ATAGGGAGGT TGGGTTTGTT TGTTTCCTGT GTCCG 55 

(2) INFORMATION FOR SEQ ID NO: 80: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 
CCTTTGACCT TCAATCTGGC GTAGACGCAG CACCACCGAT CCACCACTTG 50 

(2) INFORMATION FOR SEQ ID NO: 81: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4206 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

CATCAAGATC ATCTATGGGG ATAATTACGA CAGCAACATT GCAGAAAGAG CGTTGGTCAC 60 

AATCGAAAGA GCCTATGGCG TTGCCGTCGT TGAGGCAAAT GACAGCACCA ACAATAACGA 120 

TGGTCCCAGT GAAGAGCCTT CAGAACAGTC CATTGTTGAC GCTTAAGGCA CGGATAATTA 180 

CGTGGGGCAA AGGAACGCGG AATTAGTTAT GGGGGGATCA AAAGCGGAAG ATTTGTGTTG 240 

CTTGTGGGTT TTTTCCTTTA TTTTTCATAT GATTTCTTTG CGCAAGTAAC ATGTGCCAAT 300 

TTAGTTTGTG ATTAGCGTGC CCCACAATTG GCATCGTGGA CGGGCGTGTT TTGTCATACC 360 

CCAAGTCTTA ACTAGCTCCA CAGTCTCGAC GGTGTCTCGA CGATGTCTTC TTCCACCCCT 420 

CCCATGAATC ATTCAAAGTT GTTGGGGGAT CTCCACCAAG GGCACCGGAG TTAATGCTTA 460 

TGTTTCTCCC ACTTTGGTTG TGATTGGGGT AGTCTAGTGA GTTGGAGATT TTCTTTTTTT 540 

CGCAGGTGTC TCCGATATCG AAATTTGATG AATATAGAGA GAAGCCAGAT CAGCACAGTA 600 

GATTGCCTTT GTAGTTAGAG ATGTTGAACA GCAACTAGTT GAATTACACG CCACCACTTG 660 

ACAGCAAGTG CAGTGAGCTG TAAACGATGC AGCCAGAGTG TCACCACCAA CTGACGTTGG 720 

GTGGAGTTGT TGTTGTTGTT GTTGGCAGGG CCATATTGCT AAACGAAGAC AAGTAGCACA 780 

AAACCCAAGC TTAAGAACAA AAATAAAAAA AATTCATACG ACAATTCCAA AGCCATTGAT 840 

TTACATAATC AACAGTAAGA CAGAAAAAAC TTTCAACATT TCAAAGTTCC CTTTTTCCTA 900 

TTACTTCTTT TTTTTCTTCT TTCCTTCTTT CCTTCTGTTT TTCTTACTTT ATCAGTCTTT 960 

TACTTGTTTT TGCAATTCCT CATCCTCCTC CTACTCCTCC TCACCATGGC TTTAGACAAG 1020 

TTAGATTTGT ATGTCATCAT AACATTGGTG GTCGCTGTAG CCGCCTATTT TGCTAAGAAC 1080 

CAGTTCCTTG ATCAGCCCCA GGACACCGGG TTCCTCAACA CGGACAGCGG AAGCAACTCC 1140 

AGAGACGTCT TGCTGACATT GAAGAAGAAT AATAAAAACA CGTTGTTGTT GTTTGGGTCC 1200 

CAGACGGGTA CGGCAGAAGA TTACGCCAAC AAATTGTCCA GAGAATTGCA CTCCAGATTT 1260 

GGCTTGAAAA CGATGGTTGC AGATTTCGCT GATTACGATT GGGATAACTT CGGAGATATC 1320 

ACCGAAGACA TCTTGGTGTT TTTCATTGTT GCCACCTATG GTGAGGGTGA ACCTACCGAT 1380 

AATGCCGACG AGTTCCACAC CTGGTTGACT GAAGAAGCTG ACACTTTGAG TACCTTGAAA 1440 

TACACCGTGT TCGGGTTGGG TAACTCCACG TACGAGTTCT TCAATGCCAT TGGTAGAAAG 1500 

TTTGACAGAT TGTTGAGCGA GAAAGGTGGT GACAGGTTTG CTGAATACGC TGAAGGTGAT 1560 

GACGGTACTG GCACCTTGGA CGAAGATTTC ATGGCCTGGA AGGACAATGT CTTTGACGCC 1620 

TTGAAGAATG ATTTGAACTT TGAAGAAAAG GAATTGAAGT ACGAACCAAA CGTGAAATTG 1680 

ACTGAGAGAG ACGACTTGTC TGCTGCTGAC TCCCAAGTTT CCTTGGGTGA GCCAAACAAG 1740 

AAGTACATCA ACTCCGAGGG CATCGACTTG ACCAAGGGTC CATTCGACCA CACCCACCCA 1800 

TACTTGGCCA GAATCACCGA GACGAGAGAG TTGTTCAGCT CCAAGGACAG ACACTGTATC 1860 

CACGTTGAAT TTGACATTTC TGAATCGAAC TTGAAATACA CCACCGGTGA CCATCTAGCT 1920 

ATCTGGCCAT CCAACTCCGA CGAAAACATT AAGCAATTTG CCAAGTGTTT CGGATTGGAA 1980 

GATAAACTCG ACACTGTTAT TGAATTGAAG GCGTTGGACT CCACTTACAC CATCCCATTC 2040 

CCAACCCCAA TTACCTACGG TGCTGTCATT AGACACCATT TAGAAATCTC CGGTCCAGTC 2100 

TCGAGACAAT TCTTTTTGTC AATTGCTGGG TTTGCTCCTG ATGAAGAAAC AAAGAAGGCT 2160 

TTTACCAGAC TTGGTGGTGA CAAGCAAGAA TTCGCCGCCA AGGTCACCCG CAGAAAGTTC 2220 

AACATTGCCG ATGCCTTGTT ATATTCCTCC AACAACGCTC CATGGTCCGA TGTTCCTTTT 2280 

GAATTCCTTA TTGAAAACGT TCCACACTTG ACTCCACGTT ACTACTCCAT TTCGTCTTCG 2340 

TCATTGAGTG AAAAGCAACT CATCAACGTT ACTGCAGTTG TTGAAGCCGA AGAAGAAGCT 2400 

GATGGCAGAC CAGTCACTGG TGTTGTCACC AACTTGTTGA AGAACGTTGA AATTGTGCAA 2460 

AACAAGACTG GCGAAAAGCC ACTTGTCCAC TACGATTTGA GCGGCCCAAG AGGCAAGTTC 2520 

AACAAGTTCA AGTTGCCAGT GCATGTGAGA AGATCCAACT TTAAGTTGCC AAAGAACTCC 2580 

ACCACCCCAG TTATCTTGAT TGGTCCAGGT ACTGGTGTTG CCCCATTGAG AGGTTTTGTC 2640 

AGAGAAAGAG TTCAACAAGT CAAGAATGGT GTCAATGTTG GCAAGACTTT GTTGTTTTAT 2700 

GGTTGCAGAA ACTCCAACGA GGACTTTTTG TACAAGCAAG AATGGGCCGA GTACGCTTCT 2760 

GTTTTGGGTG AAAACTTTGA GATGTTCAAT GCCTTCTCCA GACAAGACCC ATCCAAGAAG 2820 

GTTTACGTCC AGGATAAGAT TTTAGAAAAC AGCCAACTTG TGCACGAGTT GTTGACTGAA 2880 

GGTGCCATTA TCTACGTCTG TGGTGATGCC AGTAGAATGG CTAGAGACGT GCAGACCACA 2940 

ATTTCCAAGA TTGTTGCTAA AAGCAGAGAA ATTAGTGAAG ACAAGGCTGC TGAATTGGTC 3000 

AAGTCCTGGA AGGTCCAAAA TAGATACCAA GAAGATGTTT GGTAGACTCA AACGAATCTC 3060 

TCTTTCTCCC AACGCATTTA TGAATCTTTA TTCTCATTGA AGCTTTACAT ATGTTCTACA 3120 

CTTTATTTTT 'mVI'lTm 1 TTATTATTAT ATTACGAAAC ATAGGTCAAC TATATATACT 3180 

TGATTAAATG TTATAGAAAC AATAACTATT ATCTACTCGT CTACTTCTTT GGCATTGACA 3240 

TCAACATTAC CGTTCCCATT ACCGTTGCCG TTGGCAATGC CGGGATATTT AGTACAGTAT 3300 
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CTCCAATCCG GATTTGAGCT ATTGTAGATC AGCTGCAAGT CATTCTCCAC CTTCAACCAG 3360 

TACTTATACT TCATCTTTGA CTTCAAGTCC AAGTCATAAA TATTACAAGT TAGCAAGAAC 3420 

TTCTGGCCAT CCACGATATA GACGTTATTC ACGTTATTAT GCGACGTATG GATGTGGTTA 3480 

TCCTTATTGA ACTTCTCAAA CTTCAAAAAC AACCCCACGT CCCGCAACGT CATTATCAAC 3540 

GACAAGTTCT GGCTCACGTC GTCGGAGCTC GTCAAGTTCT CAATTAGATC GTTCTTGTTA 3600 

TTGATCTTCT GGTACTTTCT CAATTGCTGG AACACATTGT CCTCGTTGTT CAAATAGATC 3660 

TTGAACAACT TTTTCAACGG GATCAACTTC TCAATCTGGG CCAAGATCTC CGCCGGGATC 3720 

TTCAGAAACA AGTCCTGCAA CCCCTGGTCG ATGGTCTCCG GGTACAACAA GTCCAAGGGG 3780 

CAGAAGTGTC TAGGCACGTG TTTCAACTGG TTCAACGAAC ATGTTCGACA GTAGTTCGAG 3840 

TTATAGTTAT CGTACAACCA TTTTGGTTTG ATTTCGAAAA TGACGGAGCT GATGCCATCA 3900 

TTCTCCTGGT TCCTCTCATA GTACAACTGG CACTTCTTCG AGAGGCTCAA TTCCTCGTAG 3960 

TTCCCGTCCA AGATATTCGG CAACAAGAGC CCGTACCGCT CACGGAGCAT CAAGTCGTGG 4020 

CCCTGGTTGT TCAACTTGTT GATGAAGTCC GAGGTCAAGA CAATCAACTG GATGTCGATG 4080 

ATCTGGTGCG GGAACAAGTT CTTGCATTTT AGCTCGATGA AGTCGTACAA CTCACACGTC 4140 

GAGATATACT CCTGTTCCTC CTTCAAGAGC CGGATCCGCA AGAGCTTGTG CTTCAAGTAG 4200 

TCGTTG 4206 

(2) INFORMATION FOR SEQ ID NO: 82: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4145 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 

TATATGATAT ATGATATATC TTCCTGTGTA ATTATTATTC GTATTCGTTA ATACTTACTA 60 

CATTTTTTTT TCTTTATTTA TGAAGAAAAG GAGAGTTCGT AAGTTGAGTT GAGTAGAATA 120 

GGCTGTTGTG CATACGGGGA GCAGAGGAGA GTATCCGACG AGGAGGAACT GGGTGAAATT 180 

TCATCTATGC TGTTGCGTCC TGTACTGTAC TGTAAATCTT AGATTTCCTA GAGGTTGTTC 240 

TAGCAAATAA AGTGTTTCAA GATACAATTT TACAGGCAAG GGTAAAGGAT CAACTGATTA 300 

GCGGAAGATT GGTGTTGCCT GTGGGGTTCT TTTATTTTTC ATATGATTTC TTTGCGCGAG 360 

TAACATGTGC CAATCTAGTT TATGATTAGC GTACCTCCAC AATTGGCATC TTGGACGGGC 420 

GTGTTTTGTC TTACCCCAAG CCTTATTTAG TTCCACAGTC TCGACGGTGT CTCGCCGATG 480 

TCTTCTCCCA CCCCTCGCAG GAATCATTCG AAGTTGTTGG GGGATCTCCT CCGCAGTTTA 540 

TGTTCATGTC TTTCCCACTT TGGTTGTGAT TGGGGTAGCG TAGTGAGTTG GTGATTTTCT 600 

TTTTTCGCAG GTGTCTCCGA TATCGAAGTT TGATGAATAT AGGAGCCAGA TCAGCATGGT 660 

ATATTGCCTT TGTAGATAGA GATGTTGAAC AACAACTAGC TGAATTACAC ACCACCGCTA 720 

AACGATGCGC ACAGGGTGTC ACCGCCAACT GACGTTGGGT GGAGTTGTTG TTGGCAGGGC 780 

CATATTGCTA AACGAAGAGA AGTAGCACAA AACCCAAGGT TAAGAACAAT TAAAAAAATT 840 

CATACGACAA TTCCACAGCC ATTTACATAA TCAACAGCGA CAAATGAGAC AGAAAAAACT 900 

TTCAACATTT CAAAGTTCCC TTTTTCCTAT TACTTCTTTT TTTCTTTCCT TCCTTTCATT 960 

TCCTTTCCTT CTGCTTTTAT TACTTTACCA GTCTTTTGCT TGTTTTTGCA ATTCCTCATC 1020 

CTCCTCCTCA CCATGGCTTT AGACAAGTTA GATTTGTATG TCATCATAAC ATTGGTGGTC 1080 

GCTGTGGCCG CCTATTTTGC TAAGAACCAG TTCCTTGATC AGCCCCAGGA CACCGGGTTC 1140 

CTCAACACGG ACAGCGGAAG CAACTCCAGA GACGTCTTGC TGACATTGAA GAAGAATAAT 1200 

AAAAACACGT TGTTGTTGTT TGGGTCCCAG ACCGGTACGG CAGAAGATTA CGCCAACAAA 1260 

TTGTCAAGAG AATTGCACTC CAGATTTGGC TTGAAAACCA TGGTTGCAGA TTTCGCTGAT 1320 

TACGATTGGG ATAACTTCGG AGATATCACC GAAGATATCT TGGTGTTTTT CATCGTTGCC 1380 

ACCTACGGTG AGGGTGAACC TACCGACAAT GCCGACGAGT TCCACACCTG GTTGACTGAA 1440 

GAAGCTGACA CTTTGAGTAC TTTGAGATAT ACCGTGTTCG GGTTGGGTAA CTCCACCTAC 1500 

GAGTTCTTCA ATGCTATTGG TAGAAAGTTT GACAGATTGT TGAGTGAGAA AGGTGGTGAC 1560 

AGATTTGCTG AATATGCTGA AGGTGACGAC GGCACTGGCA CCTTGGACGA AGATTTCATG 1620 

GCCTGGAAGG ATAATGTCTT TGACGCCTTG AAGAATGACT TGAACTTTGA AGAAAAGGAA 1680 

TTGAAGTACG AACCAAACGT GAAATTGACT GAGAGAGATG ACTTGTCTGC TGCCGACTCC 1740 

CAAGTTTCCT TGGGTGAGCC AAACAAGAAG TACATCAACT CCGAGGGCAT CGACTTGACC 1800 

AAGGGTCCAT TCGACCACAC CCACCCATAC TTGGCCAGGA TCACCGAGAC CAGAGAGTTG 1860 
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TTCAGCTCCA AGGAAAGACA CTGTATTCAC GTTGAATTTG ACATTTCTGA ATCGAACTTG 1920 

AAATACACCA CCGGTGACCA TCTAGCCATC TGGCCATCCA ACTCCGACGA AAACATCAAG 1980 

CAATTTGCCA AGTGTTTCGG ATTGGAAGAT AAACTCGACA CTGTTATTGA ATTGAAGGCA 2040 

TTGGACTCCA CTTACACCAT TCCATTCCCA ACTCCAATTA CTTACGGTGC TGTCATTAGA 2100 

CACCATTTAG AAATCTCCGG TCCAGTCTCG AGACAATTCT TTTTGTCGAT TGCTGGGTTT 2160 

GCTCCTGATG AAGAAACAAA GAAGACTTTC ACCAGACTTG GTGGTGACAA ACAAGAATTC 2220 

GCCACCAAGG TTACCCGCAG AAAGTTCAAC ATTGCCGATG CCTTGTTATA TTCCTCCAAC 2280 

AACACTCCAT GGTCCGATGT TCCTTTTGAG TTCCTTATTG AAAACATCCA ACACTTGACT 2340 

CCACGTTACT ACTCCATTTC TTCTTCGTCG TTGAGTGAAA AACAACTCAT CAATGTTACT 2400 

GCAGTCGTTG AGGCCGAAGA AGAAGCCGAT GGCAGACCAG TCACTGGTGT TGTTACCAAC 2460 

TTGTTGAAGA ACATTGAAAT TGCGCAAAAC AAGACTGGCG AAAAGCCACT TGTTCACTAC 2520 

GATTTGAGCG GCCCAAGAGG CAAGTTCAAC AAGTTCAAGT TGCCAGTGCA CGTGAGAAGA 2580 

TCCAACTTTA AGTTGCCAAA GAACTCCACC ACCCCAGTTA TCTTGATTGG TCCAGGTACT 2640 

GGTGTTGCCC CATTGAGAGG TTTCGTTAGA GAAAGAGTTC AACAAGTCAA GAATGGTGTC 2700 

AATGTTGGCA AGACTTTGTT GTTTTATGGT TGCAGAAACT CCAACGAGGA CTTTTTGTAC 2760 

AAGCAAGAAT GGGCCGAGTA CGCTTCTGTT TTGGGTGAAA ACTTTGAGAT GTTCAATGCC 2820 

TTCTCTAGAC AAGACCCATC CAAGAAGGTT TACGTCCAGG ATAAGATTTT AGAAAACAGC 2880 

CAACTTGTGC ACGAATTGTT GACCGAAGGT GCCATTATCT ACGTCTGTGG TGACGCCAGT 2940 

AGAATGGCCA GAGACGTCCA GACCACGATC TCCAAGATTG TTGCCAAAAG CAGAGAAATC 3000 

AGTGAAGACA AGGCCGCTGA ATTGGTCAAG TCCTGGAAAG TCCAAAATAG ATACCAAGAA 3060 

GATGTTTGGT AGACTCAAAC GAATCTCTCT TTCTCCCAAC GCATTTATGA ATATTCTCAT 3120 

TGAAGTTTTA CATATGTTCT ATATTTCATT TTTTTTTTAT TATATTACGA AACATAGGTC 3180 

AACTATATAT ACTTGATTAA ATGTTATAGA AACAATAATT ATTATCTACT CGTCTACTTC 3240 

TTTGGCATTG GCATTGGCAT TGGCATTGGC ATTGCCGTTG CCGTTGGTAA TGCCGGGATA 3300 

TTTAGTACAG TATCTCCAAT CCGGATTTGA GCTATTGTAA ATCAGCTGCA AGTCATTCTC 3360 

CACCTTCAAC CAGTACTTAT ACTTCATCTT TGACTTCAAG TCCAAGTCAT AAATATTACA 3420 

AGTTAGCAAG AACTTCTGGC CATCCACAAT ATAGACGTTA TTCACGTTAT TATGCGACGT 3480 

ATGGATATGG TTATCCTTAT TGAACTTCTC AAACTTCAAA AACAACCCCA CGTCCCGCAA 3540 

CGTCATTATC AACGACAAGT TCTGACTCAC GTCGTCGGAG CTCGTCAAGT TCTCAATTAG 3600 

ATCGTTCTTG TTATTGATCT TCTGGTACTT TCTCAACTGC TGGAACACAT TGTCCTCGTT 3660 

GTTCAAATAG ATCTTGAACA ACTTCTTCAA GGGAATCAAC TTTTCGATCT GGGCCAAGAT 3720 

TTCCGCCGGG ATCTTCAGAA ACAAGTCCTG CAACCCCTGG TCGATGGTCT CGGGGTACAA 3780 

CAAGTCTAAG GGGCAGAAGT GTCTAGGCAC GTGTTTCAAC TGGTTCAAGG AACATGTTCG 3840 

ACAGTAGTTC GAGTTATAGT TATCGTACAA CCACTTTGGC TTGATTTCGA AAATGACGGA 3900 

GCTGATCCCA TCATTCTCCT GGTTCCTTTC ATAGTACAAC TGGCATTTCT TCGAGAGACT 3960 

CAACTCCTCG TAGTTCCCGT CCAAGATATT CGGCAACAAG AGCCCGTAGC GCTCACGGAG 4020 

CATCAAGTCG TGGCCCTGGT TGTTCAACTT GTTGATGAAG TCCGATGTCA AGACAATCAA 4080 

CTGGATGTCG ATGATCTGGT GCGGAAACAA GTTCTTGCAC TTTAGCTCGA TGAAGTCGTA 4140 

CAACT 4145 



(2) INFORMATION FOR SEQ ID NO: 83: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 679 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 

Met Ala Leu Asp Lys Leu Asp Leu Tyr Val He He Thr Leu Val Val 

15 10 15 

Ala Val Ala Ala Tyr Phe Ala Lys Asn Gin Phe Leu Asp Gin Pro Gin 

20 25 30 

Asp Thr Gly Phe Leu Asn Thr Asp Ser Gly Ser Asn Ser Arg Asp Val 

35 40 45 

Leu Leu Thr Leu Lys Lys Asn Asn Lys Asn Thr Leu Leu Leu Phe Gly 
50 55 60 
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Phe 
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Ser 


Lys 
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Glu He 






650 
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Glu 
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val 


Lys Ser Trp Lys 
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665 




670 




Trp 













515 

Val He Leu He Gly Pro 
530 

Val Arg Glu Arg Val Gin 
545 550 

Thr Leu Leu Phe Tyr Gly 

565 

Lys Gin Glu Trp Ala Glu 

580 

Met Phe Asn Ala Phe Ser 
595 

Gin Asp Lys He Leu Glu 
610 

Glu Gly Ala He He Tyr 

625 €3 ° 
Asp Val Gin Thr Thr He 

645 

Ser Glu Asp Lys Ala Ala 

660 

Arg Tyr Gin Glu Asp Val 
675 

(2) INFORMATION FOR SEQ ID NO: 84: 
(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 679 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

Met Ala Leu Asp Lys Leu Asp Leu Tyr Val He He Thr Leu Val Val 

5 10 15 

Ala Val Ala Ala Tyr Phe Ala Lys Asn Gin Phe Leu Asp Gin Pro Gin 

20 25 3° 

Asp Thr Gly Phe Leu Asn Thr Asp Ser Gly Ser Asn Ser Arg Asp Val 

35 40 45 

Leu Leu Thr Leu Lys Lys Asn Asn Lys Asn Thr Leu Leu Leu Phe Gly 

50 55 ' 60 

ser Gin Thr Gly Thr Ala Glu Asp Tyr Ala Asn Lys Leu Ser Arg Glu 

65 70 75 

Leu His Ser Arg Phe Gly Leu Lys Thr Met Val Ala Asp Phe Ala Asp 

85 90 95 

Tyr Asp Trp Asp Asn Phe Gly Asp lie Thr Glu Asp He Leu Val Phe 

100 1° 5 110 

Phe He Val Ala Thr Tyr Gly Glu Gly Glu Pro Thr Asp Asn Ala Asp 

115 120 125 

Glu Phe His Thr Trp Leu Thr Glu Glu Ala Asp Thr Leu Ser Thr Leu 

130 135 140 

Arg Tyr Thr Val Phe Gly Leu Gly Asn Ser Thr Tyr Glu Phe Phe Asn 

• Ala He Gly Arg Lys Phe Asp Arg Leu Leu Ser Glu Lys Gly Gly Asp 

165 170 "5 

Arg Phe Ala Glu Tyr Ala Glu Gly Asp Asp Gly Thr Gly Thr Leu Asp 

180 185 190 
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Glu Asp Phe Met Ala Trp Lys Asp Asn Val Phe Asp Ala Leu Lys Asn 

195 200 205 

Asp Leu Asn Phe Glu Glu Lys Glu Leu Lys Tyr Glu Pro Asn Val Lys 

210 215 220 

Leu Thr Glu Arg Asp Asp Leu Ser Ala Ala Asp Ser Gin Val Ser Leu 

225 230 235 240 

Gly Glu Pro Asn Lys Lys Tyr He Asn Ser Glu Gly He Asp Leu Thr 

245 250 255 

Lys Gly Pro Phe Asp His Thr His Pro Tyr Leu Ala Arg He Thr Glu 

260 265 270 

Thr Arg Glu Leu Phe Ser Ser Lys Glu Arg His Cys He His Val Glu 

275 260 285 

Phe Asp He Ser Glu Ser Asn Leu Lys Tyr Thr Thr Gly Asp His Leu 

290 295 300 

Ala lie Trp Pro Ser Asn Ser Asp Glu Asn He Lys Gin Phe Ala Lys 
305 310 315 320 

Cys Phe Gly Leu Glu Asp Lys Leu Asp Thr Val He Glu Leu Lys Ala 

325 330 335 

Leu Asp Ser Thr Tyr Thr He Pro Phe Pro Thr Pro He Thr Tyr Gly 

340 345 350 

Ala Val lie Arg His His Leu Glu lie Ser Gly Pro Val Ser Arg Gin 

355 360 365 

Phe Phe Leu Ser He Ala Gly Phe Ala Pro Asp Glu Glu Thr Lys Lys 

370 375 380 

Thr Phe Thr Arg Leu Gly Gly Asp Lys Gin Glu Phe Ala Thr Lys Val 
3B5 390 395 400 

Thr Arg Arg Lys Phe Asn He Ala Asp Ala Leu Leu Tyr Ser Ser Asn 

405 410 415 

Asn Thr Pro Trp Ser Asp Val Pro Phe Glu Phe Leu He Glu Asn He 

420 425 430 

Gin His Leu Thr Pro Arg Tyr Tyr Ser lie Ser Ser Ser Ser Leu Ser 

435 440 445 

Glu Lys Gin Leu He Asn Val Thr Ala Val Val Glu Ala Glu Glu Glu 

450 455 460 

Ala Asp Gly Arg Pro Val Thr Gly Val Val Thr Asn Leu Leu Lys Asn 
465 470 475 480 

He Glu He Ala Gin Asn Lys Thr Gly Glu Lys Pro Leu Val His Tyr 

485 490 495 

Asp Leu Ser Gly Pro Arg Gly Lys Phe Asn Lys Phe Lys Leu Pro Val 

500 505 510 

His Val Arg Arg Ser Asn Phe Lys Leu Pro Lys Asn Ser Thr Thr Pro 

515 520 525 

Val He Leu He Gly Pro Gly Thr Gly Val Ala Pro Leu Arg Gly Phe 

530 535 540 

Val Arg Glu Arg Val Gin Gin Val Lys Asn Gly Val Asn Val Gly Lys 
545 550 555 560 

Thr Leu Leu Phe Tyr Gly Cys Arg Asn Ser Asn Glu Asp Phe Leu Tyr 

565 570 575 

Lys Gin Glu Trp Ala Glu Tyr Ala Ser Val Leu Gly Glu Asn Phe Glu 

580 585 590 

Met Phe Asn Ala Phe Ser Arg Gin Asp Pro Ser Lys Lys Val Tyr Val 

595 600 605 

Gin Asp Lys He Leu Glu Asn Ser Gin Leu Val His Glu Leu Leu Thr 

610 615 620 

Glu Gly Ala He He Tyr Val Cys Gly Asp Ala Ser Arg Met Ala Arg 
625 630 635 640 
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Asp Val Gin Thr Thr lie Ser Lys lie Val Ala Lys Ser Arg Glu He 

645 650 655 

Ser Glu Asp Lys Ala Ala Glu Leu Val Lys Ser Trp Lys Val Gin Asn 

660 665 670 

Arg Tyr Gin Glu Asp Val Trp 
675 

(2) INFORMATION FOR SEQ ID NO: 85: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 

CATATGCGCT AATCTTCTTT TTCTTTTTAT CACAGOAGAA ACTATCCCAC CCCCACTTCG 60 

AAACACAATG ACAACTCCTG CGTAACTTGC AAATTCTTGT CTGACTAATT GAAAACTCCG 120 

GACGAGTCAG ACCTCCAGTC AAACGGACAG ACAGACAAAC ACTTGGTGCG ATGTTCATAC 180 

CTACAGACAT GTCAACGGGT GTTAGACGAC GGTTTCTTGC AAAGACAGGT GTTGGCATCT 240 

CGTACGATGG CAACTGCAGG AGGTGTCGAC TTCTCCTTTA GGCAATAGAA AAAGACTAAG 300 

AGAACAGCGT TTTTACAGGT TGCATTGGTT AATGTAGTAT TTTTTTAGTC CCAGCATTCT 360 

GTGGGTTGCT CTGGGTTTCT AGAATAGGAA ATCACAGGAG AATGCAAATT CAGATGGAAG 420 

AACAAAGAGA TAAAAAACAA AAAAAAACTG AGTTTTGCAC CAATAGAATG TTTGATGATA 480 

TCATCCACTC GCTAAACGAA TCATGTGGGT GATCTTCTCT TTAGTTTTGG TCTATCATAA 540 

AACACATGAA AGTGAAATCC AAATACACTA CACTCCGGGT ATTGTCCTTC GTTTTACAGA 600 

TGTCTCATTG TCTTACTTTT GAGGTCATAG GAGTTGCCTG TGAGAGATCA CAGAGATTAT 660 

CACACTCACA TTTATCGTAG TTTCCTATCT CATGCTGTGT GTCTCTGGTT GGTTCATGAG 720 

TTTGGATTGT TGTACATTAA AGGAATCGCT GGAAAGCAAA GCTAACTAAA TTTTCTTTGT 780 

CACAGGTACA CTAACCTGTA AAACTTCACT GCCACGCCAG TCTTTCCTGA TTGGGCAAGT 840 

GCACAAACTA CAACCTGCAA AACAGCACTC CGCTTGTCAC AGGTTGTCTC CTCTCAACCA 900 

ACAAAAAAAT AAGATTAAAC TTTCTTTGCT CATGCATCAA TCGGAGTTAT CTCTGAAAGA 960 

GTTGCCTTTG TGTAATGTGT GCCAAACTCA AACTGCAAAA CTAACCACAG AATGATTTCC 1020 

CTCACAATTA TATAAACTCA CCCACATTTC CACAGACCGT AATTTCATGT CTCACTTTCT 1080 

CTTTTGCTCT TCTTTTACTT AGTCAGGTTT GATAACTTCC TTTTTTATTA CCCTATCTTA 1140 

TTTATTTATT TATTCATTTA TACCAACCAA CCAACCATGG CCACACAAGA AATCATCGAT 1200 

TCTGTACTTC CGTACTTGAC CAAATGGTAC ACTGTGATTA CTGCAGCAGT ATTAGTCTTC 1260 

CTTATCTCCA CAAACATCAA GAACTACGTC AAGGCAAAGA AATTGAAATG TGTCGATCCA 1320 

CCATACTTGA AGGATGCCGG TCTCACTGGT ATTCTGTCTT TGATCGCCGC CATCAAGGCC 1380 

AAGAACGACG GTAGATTGGC TAACTTTGCC GATGAAGTTT TCGACGAGTA CCCAAACCAC 1440 

ACCTTCTACT TGTCTGTTGC CGGTGCTTTG AAGATTGTCA TGACTGTTGA CCCAGAAAAC 1500 

ATCAAGGCTG TCTTGGCCAC CCAATTCACT GACTTCTCCT TGGGTACCAG ACACGCCCAC 1560 

TTTGCTCCTT TGTTGGGTGA CGGTATCTTC ACCTTGGACG GAGAAGGTTG GAAGCACTCC 1620 

AGAGCTATGT TGAGACCACA GTTTGCTAGA GACCAGATTG GACACGTTAA AGCCTTGGAA 1680 

CCACACATCC AAATCATGGC TAAGCAGATC AAGTTGAACC AGGGAAAGAC TTTCGATATC 1740 

CAAGAATTGT TCTTTAGATT TACCGTCGAC ACCGCTACTG AGTTCTTGTT TGGTGAATCC 1800 

GTTCACTCCT TGTACGATGA AAAATTGGGC ATCCCAACTC CAAACGAAAT CCCAGGAAGA 1860 

GAAAACTTTG CCGCTGCTTT CAACGTTTCC CAACACTACT TGGCCACCAG AAGTTACTCC 1920 

CAGACTTTTT ACTTTTTGAC CAACCCTAAG GAATTCAGAG ACTGTAACGC CAAGGTCCAC 1980 

CACTTGGCCA AGTACTTTGT CAACAAGGCC TTGAACTTTA CTCCTGAAGA ACTCGAAGAG 2040 

AAATCCAAGT CCGGTTACGT TTTCTTGTAC GAATTGGTTA AGCAAACCAG AGATCCAAAG 2100 

GTCTTGCAAG ATCAATTGTT GAACATTATG GTTGCCGGAA GAGACACCAC TGCCGGTTTG 2160 

TTGTCCTTTG CTTTGTTTGA ATTGGCTAGA CACCCAGAGA TGTGGTCCAA GTTGAGAGAA 2220 

GAAATCGAAG TTAACTTTGG TGTTGGTGAA GACTCCCGCG TTGAAGAAAT TACCTTCGAA 2280 

GCCTTGAAGA GATGTGAATA CTTGAAGGCT ATCCTTAACG AAACCTTGCG TATGTACCCA 2340 

TCTGTTCCTG TCAACTTTAG AACCGCCACC AGAGACACCA CTTTGCCAAG AGGTGGTGGT 2400 

GCTAACGGTA CCGACCCAAT CTACATTCCT AAAGGCTCCA CTGTTGCTTA CGTTGTCTAC 2460 
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AAGACCCACC GTTTGGAAGA ATACTACGGT AAGGACGCTA ACGACTTCAG ACCAGAAAGA 2520 

TGGTTTGAAC CATCTACTAA GAAGTTGGGC TGGGCTTATG TTCCATTCAA CGGTGGTCCA 2580 

AGAGTCTGCT TGGGTCAACA ATTCGCCTTG ACTGAAGCTT CTTATGTGAT CACTAGATTG 2640 

GCCCAGATGT TTGAAACTGT CTCATCTGAT CCAGGTCTCG AATACCCTCC ACCAAAGTGT 2700 

ATTCACTTGA CCATGAGTCA CAACGATGGT GTCTTTGTCA AGATGTAAAG TAGTCGATGC 2760 

TGGGTATTCG ATTACATGTG TATAGGAAGA TTTTGGTTTT TTATTCGTTC TTTTTTTTAA 2820 

TTTTTGTTAA ATTAGTTTAG AGATTTCATT AATACATAGA TGGGTGCTAT TTCCGAAACT 2880 

TTACTTCTAT CCCCTGTATC CCTTATTATC CCTCTCAGTC ACATGATTGC TGTAATTGTC 2940 

GTGCAGGACA CAAACTCCCT AACGGACTTA AACCATAAAC AAGCTCAGAA CCATAAGCCG 3000 

ACATCACTCC TTCTTCTCTC TTCTCCAACC AATAGCATGG ACAGACCCAC CCTCCTATCC 3060 

GAATCGAAGA CCCTTATTGA CTCCATACCC ACCTGGAAGC CCCTCAAGCC ACACACGTCA 3X20 

TCCAGCCCAC CCATCACCAC ATCCCTCTAC TCGACAACGT CCAAAGACGG CGAGTTCTGG 3180 

TGTGCCCGGA AATCAGCCAT CCCGGCCACA TACAAGCAGC CGTTGATTGC GTGCATACTC 3240 

GGCGAGCCCA CAATGGGAGC CACGCATTCG GACCATGAAG CAAAGTACAT TCACGAGATC 3300 

ACGGGTGTTT CAGTGTCGCA GATTGAGAAG TTCGACGATG GATGGAAGTA CGATCTCGTT 3360 

GCGGATTACG ACTTCGGTGG GTTGTTATCT AAACGAAGAT TCTATGAGAC GCAGCATGTG 3420 

TTTCGGTTCG AGGATTGTGC GTACGTCATG AGTGTGCCTT TTGATGGACC CAAGGAGGAA 3480 

GGTTACGTGG TTGGGACGTA CAGATCCATT GAAAGGTTGA GCTGGGGTAA AGACGGGGAC 3540 

GTGGAGTGGA CCATGGCGAC GACGTCGGAT CCTGGTGGGT TTATCCCGCA ATGGATAACT 3600 

CGATTGAGCA TCCCTGGAGC AATCGCAAAA GATGTGCCTA GTGTATTAAA CTACATACAG 3660 

AAATAAAAAC GTGTCTTGAT TCATTGGTTT GGTTCTTGTT GGGTTCCGAG CCAATATTTC 3720 

ACATCATCTC CTAAATTCTC CAAGAATCCC AACGTAGCGT AGTCCAGCAC GCCCTCTGAG 3780 

ATCTTATTTA ATATCGACTT CTCAACCACC GGTGGAATCC CGTTCAGACC ATTGTTACCT 3840 

GTAGTGTGTT TGCTCTTGTT CTTGATGACA ATGATGTATT TGTCACGATA CCTGAAATAA 3900 

TAAAACATCC AGTCATTGAG CTTATTACTC GTGAACTTAT GAAAGAACTC ATTCAAGCCG 3960 

TTCCCAAAAA ACCCAGAATT GAAGATCTTG CTCAACTGGT CATGCAAGTA GTAGATCGCC 4020 

ATGATCTGAT ACTTTACCAA GCTATCCTCT CCAAGTTCTC CCACGTACGG CAAGTACGGC 4080 

AACGAGCTCT GGAAGCTTTG TTGTTTGGGG TCATA 4115 

(2) INFORMATION FOR SEQ ID NO: 86: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3946 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

GACCTGTGAC GCTTCCGGTG TCTTGCCACC AGTCTCCAAG TTGACCGACG CCCAAGTCAT 60 

GTACCACTTT ATTTCCGGTT ACACTTCCAA GATGGCTGGT ACTGAAGAAG GTGTCACGGA 120 

ACCACAAGCT ACTTTCTCCG CTTGTTTCGG TCAACCATTC TTGGTGTTGC ACCCAATGAA 180 
GTACGCTCAA CAATTGTCTG ACAAGATCTC GCAACACAAG GCTAACGCCT GGTTGTTGAA • 240 

CACCGGTTGG GTTGGTTCTT CTGCTGCTAG AGGTGGTAAG AGATGCTCAT TGAAGTACAC 300 

CAGAGCCATT TTGGACGCTA TCCACTCTGG TGAATTGTCC AAGGTTGAAT ACGAAACTTT 360 

CCCAGTCTTC AACTTGAATG TCCCAACCTC CTGTCCAGGT GTCCCAAGTG AAATCTTGAA 420 

CCCAACCAAG GCCTGGACCG GAAGGTGTTG ACTCCTTCAA CAAGGAAATC AAGTCTTTGG 480 

CTGGTAAGTT TGCTGAAAAC TTCAAGACCT ATGCTGACCA AGCTACCGCT GAAGTGAGAG 540 

CTGCAGGTCC AGAAGCTTAA AGATATTTAT TCATTATTTA GTTTGCCTAT TTATTTCTCA 600 

TTACCCATCA TCATTCAACA CTATATATAA AGTTACTTCG GATATCATTG TAATCGTGCG 660 

TGTCGCAATT GGATGATTTG GAACTGCGCT TGAAACGGAT TCATGCACGA AGCGGAGATA 720 

AAAGATTACG TAATTTATCT CCTGAGACAA TTTTAGCCGT GTTCACACGC CCTTCTTTGT 780 

TCTGAGCGAA GGATAAATAA TTAGACTTCC ACAGCTCATT CTAATTTCCG TCACGCGAAT 840 

ATTGAAGGGG GGTACATGTG GCCGCTGAAT GTGGGGGCAG TAAACGCAGT CTCTCCTCTC 900 

CCAGGAATAG TGCAACGGAG GAAGGATAAC GGATAGAAAG CGGAATGCGA GGAAAATTTT 960 

GAACGCGCAA GAAAAGCAAT ATCCGGGCTA CCAGGTTTTG AGCCAGGGAA CACACTCCTA 1020 

TTTCTGCTCA ATGACTGAAC ATAGAAAAAA CACCAAGACG CAATGAAACG CACATGGACA 1080 

TTTAGACCTC CCCACATGTG ATAGTTTGTC TTAACAGAAA AGTATAATAA GAACCCATGC 1140 



-26- 



WO 00/20566 



PCT/US99/20797 



CGTCCCTTTT CTTTCGCCGC TTCAACTTTT TTTTTTTTAT CTTACACACA TCACGACCAT 1200 

GACTGTACAC GATATTATCG CCACATACTT CACCAAATGG TACGTGATAG TACCACTCGC 1260 

TTTGATTGCT TATAGAGTCC TCGACTACTT CTATGGCAGA TACTTGATGT ACAAGCTTGG 1320 

TGCTAAACCA TTTTTCCAGA AACAGACAGA CGGCTGTTTC GGATTCAAAG CTCCGCTTGA 1380 

ATTGTTGAAG AAGAAGAGCG ACGGTACCCT CATAGACTTC ACACTCCAGC GTATCCACGA 1440 

TCTCGATCGT CCCGATATCC CAACTTTCAC ATTCCCGGTC TTTTCCATCA ACCTTGTCAA 1500 

TACCCTTGAG CCGGAGAACA TCAAGGCCAT CTTGGCCACT CAGTTCAACG ATTTCTCCTT 1560 

GGGTACCAGA CACTCGCACT TTGCTCCTTT GTTGGGTGAT GGTATCTTTA CGTTGGATGG 1620 

CGCCGGCTGG AAGCACAGCA GATCTATGTT GAGACCACAG TTTGCCAGAG AACAGATTTC 1680 

CCACGTCAAG TTGTTGGAGC CACACGTTCA GGTGTTCTTC AAACACGTCA GAAAGGCACA 1740 

GGGCAAGACT TTTGACATCC AGGAATTGTT TTTCAGATTG ACCGTCGACT CCGCCACCGA 1800 

GTTTTTGTTT GGTGAATCCG TTGAGTCCTT GAGAGATGAA TCTATCGGCA TGTCCATCAA 1860 

TGCGCTTGAC TTTGACGGCA AGGCTGGCTT TGCTGATGCT TTTAACTATT CGCAGAATTA 1920 

TTTGGCTTCG AGAGCGGTTA TGCAACAATT GTACTGGGTG TTGAACGGGA AAAAGTTTAA 1980 

GGAGTGCAAC GCTAAAGTGC ACAAGTTTGC TGACTACTAC GTCAACAAGG CTTTGGACTT 2040 

GACGCCTGAA CAATTGGAAA AGCAGGATGG TTATGTGTTT TTGTACGAAT TGGTCAAGCA 2100 

AACCAGAGAC AAGCAAGTGT TGAGAGACCA ATTGTTGAAC ATCATGGTTG CTGGTAGAGA 2160 

CACCACCGCC GGTTTGTTGT CGTTTGTTTT CTTTGAATTG GCCAGAAACC CAGAAGTTAC 2220 

CAACAAGTTG AGAGAAGAAA TTGAGGACAA GTTTGGACTC GGTGAGAATG CTAGTGTTGA 2280 

AGACATTTCC TTTGAGTCGT TGAAGTCCTG TGAATACTTG AAGGCTGTTC TCAACGAAAC 2340 

CTTGAGATTG TACCCATCCG TGCCACAGAA TTTCAGAGTT GCCACCAAGA ACACTACCCT 2400 

CCCAAGAGGT GGTGGTAAGG ACGGGTTGTC TCCTGTTTTG GTGAGAAAGG GTCAGACCGT 2460 

TATTTACGGT GTCTACGCAG CCCACAGAAA CCCAGCTGTT TACGGTAAGG ACGCTCTTGA 2520 

GTTTAGACCA GAGAGATGGT TTGAGCCAGA GACAAAGAAG CTTGGCTGGG CCTTCCTCCC 2580 

ATTCAACGGT GGTCCAAGAA TCTGTTTGGG ACAGCAGTTT GCCTTGACAG AAGCTTCGTA 2640 

TGTCACTGTC AGGTTGCTCC AGGAGTTTGC ACACTTGTCT ATGGACCCAG ACACCGAATA 2700 

TCCACCTAAG AAAATGTCGC ATTTGACCAT GTCGCTTTTC GACGGTGCCA ATATTGAGAT 2760 

GTATTAGAGG GTCATGTGTT ATTTTGATTG TTTAGTTTGT AATTACTGAT TAGGTTAATT 2820 

CATGGATTGT TATTTATTGA TAGGGGTTTG CGCGTGTTGC ATTCACTTGG GATCGTTCCA 2880 

GGTTGATGTT TCCTTCCATC CTGTCGAGTC AAAAGGAGTT TTGTTTTGTA ACTCCGGACG 2940 

ATGTTTTAAA TAGAAGGTCG ATCTCCATGT GATTGTTTTG ACTGTTACTG TGATTATGTA 3000 

ATCTGCGGAC GTTATACAAG CATGTGATTG TGGTTTTGCA GCCTTTTGCA CGACAAATGA 3060 

TCGTCAGACG ATTACGTAAT CTTTGTTAGA GGGGTAAAAA AAAACAAAAT GGCAGCCAGA 3120 

ATTTCAAACA TTCTGCAAAC AATGCAAAAA ATGGGAAACT CCAACAGACA AAAAAAAAAA 3180 

CTCCGCAGCA CTCCGAACCC ACAGAACAAT GGGGCGCCAG AATTATTGAC TATTGTGACT 3240 

TTTTTACGCT AACGCTCATT GCAGTGTAGT GCGTCTTACA CGGGGTATTG CTTTCTACAA 3300 

TGCAAGGGCA CAGTTGAAGG TTTGCACCTA ACGTTGCCCC GTGTCAACTC AATTTGACGA 3360 

GTAACTTCCT AAGCTCGAAT TATGCAGCTC GTGCGTCAAC CTATGTGCAG GAAAGAAAAA 3420 

ATCCAAAAAA ATCGAAAATG CGACTTTCGA tTTTGAATAA ACCAAAAAGA AAAATGTCGC 3480 

ACTTTTTTCT CGCTCTCGCT CTCTCGACCC AAATCACAAC AAATCCTCGC GCGCAGTATT 3540 

TCGACGAAAC CACAACAAAT AAAAAAAACA AATTCTACAC CACTTCTTTT TCTTCACCAG 3600 

TCAACAAAAA ACAACAAATT ATACACCATT TCAACGATTT TTGCTCTTAT AAATGCTATA 3660 

TAATGGTTTA ATTCAACTCA GGTATGTTTA TTTTACTGTT TTCAGCTCAA GTATGTTCAA 3720 

ATACTAACTA CTTTTGATGT TTGTCGCTTT TCTAGAATCA AAACAACGCC CACAACACGC 3780 

CGAGCTTGTC GAATAGACGG TTTGTTTACT CATTAGATGG TCCCAGATTA CTTTTCAAGC 3840 

CAAAGTCTCT CGAGTTTTGT TTGCTGTTTC CCCAATTCCT AACTATGAAG GGTTTTTATA 3900 

AGGTCCAAAG ACCCCAAGGC ATAGTTTTTT TGGTTCCTTC TTGTCGTG 3948 

(2) INFORMATION FOR SEQ ID NO: 87: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3755 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

GCTCAACAAT TGTCTGACAA GATCTCGCAA CACAAGGCTA ACGCCTGGTT GTTGAACACT 60 

GGTTGGGTTG GTTCTTCTGC TGCTAGAGGT GGTAAGAGAT GTTCATTGAA GTACACCAGA 120 

GCCATTTTGG ACGCTATCCA CTCTGGTGAA TTGTCCAAGG TTGAATACGA GACTTTCCCA 180 

GTCTTCAACT TGAATGTCCC AACCTCCTGC CCAGGTGTCC CAAGTGAAAT CTTGAACCCA 240 

ACCAAGGCCT GGACCGAAGG TGTTGACTCC TTCAACAAGG AAATCAAGTC TTTGGCTGGT 300 

AAGTTTGCTG AAAACTTCAA GACCTATGCT GACCAAGCTA CCGCTGAAGT TAGAGCTGCA 360 

GGTCCAGAAG CTTAAAGATA TTTATTCACT ATTTAGTTTG CCTATTTATT TCTCATCACC 420 

CATCATCATT CAACAATATA TATAAAGTTA TTTCGGAACT CATATATCAT TGTAATCGTG 480 

CGTGTTGCAA TTGGGTAATT TGAAACTGTA GTTGGAACGG ATTCATGCAC GATGCGGAGA 540 

TAACACGAGA TTATCTCCTA AGACAATTTT GGCCTCATTC ACACGCCCTT CTTCTGAGCT 600 

AAGGATAAAT AATTAGACTT CACAAGTTCA TTAAAATATC CGTCACGCGA AAACTGCAAC 660 

AATAAGGAAG GGGGGGGTAG ACGTAGCCGA TGAATGTGGG GTGCCAGTAA ACGCAGTCTC 720 

TCTCTCCCCC CCCCCCCCCC CCCCCTCAGG AATAGTACAA CGGGGGAAGG ATAACGGATA 780 

GCAAGTGGAA TGCGAGGAAA ATTTTGAATG CGCAAGGAAA GCAATATCCG GGCTATCAGG 840 

TTTTGAGCCA GGGGACACAC TCCTCTTCTG CACAAAAACT TAACGTAGAC AAAAAAAAAA 900 

AACTCCACCA AGACACAATG AATCGCACAT GGACATTTAG ACCTCCCCAC ATGTGAAAGC 960 

TTCTCTGGCG AAAGCAAAAA AAGTATAATA AGGACCCATG CCTTCCCTCT TCCTGGGCCG 1020 

TTTCAACTTT TTCTTTTTCT TTGTCTATCA ACACACACAC ACCTCACGAC CATGACTGCA 1080 

CAGGATATTA TCGCCACATA CATCACCAAA TGGTACGTGA TAGTACCACT CGCTTTGATT 1140 

GCTTATAGGG TCCTCGACTA CTTTTACGGC AGATACTTGA TGTACAAGCT TGGTGCTAAA 1200 

CCGTTTTTCC AGAAACAAAC AGACGGTTAT TTCGGATTCA AAGCTCCACT TGAATTGTTA 1260 

AAAAAGAAGA GTGACGGTAC CCTCATAGAC TTCACTCTCG AGCGTATCCA AGCGCTCAAT 1320 

CGTCCAGATA TCCCAACTTT TACATTCCCA ATCTTTTCCA TCAACCTTAT CAGCACCCTT 1380 

GAGCCGGAGA ACATCAAGGC TATCTTGGCC ACCCAGTTCA ACGATTTCTC CTTGGGCACC 1440 

AGACACTCGC ACTTTGCTCC TTTGTTGGGC GATGGTATCT TTACCTTGGA CGGTGCCGGC 1500 

TGGAAGCACA GCAGATCTAT GTTGAGACCA CAGTTTGCCA GAGAACAGAT TTCCCACGTC 1560 

AAGTTGTTGG AGCCACACAT GCAGGTGTTC TTCAAGCACG TCAGAAAGGC ACAGGGCAAG 1620 

ACTTTTGACA TCCAAGAATT GTTTTTCAGA TTGACCGTCG ACTCCGCCAC TGAGTTTTTG 1680 

TTTGGTGAAT CCGTTGAGTC CTTGAGAGAT GAATCTATTG GGATGTCCAT CAATGCACTT 1740 

GACTTTGACG GCAAGGCTGG CTTTGCTGAT GCTTTTAACT ACTCGCAGAA CTATTTGGCT 1800 

TCGAGAGCGG TTATGCAACA ATTGTACTGG GTGTTGAACG GGAAAAAGTT TAAGGAGTGC i860 

AACGCTAAAG TGCACAAGTT TGCTGACTAT TACGTCAGCA AGGCTTTGGA CTTGACACCT 1920 

GAACAATTGG AAAAGCAGGA TGGTTATGTG TTCTTGTACG AGTTGGTCAA GCAAACCAGA 1980 

GACAGGCAAG TGTTGAGAGA CCAGTTGTTG AACATCATGG TTGCCGGTAG AGACACCACC 2040 

GCCGGTTTGT TGTCGTTTGT TTTCTTTGAA TTGGCCAGAA ACCCAGAGGT GACCAACAAG 2100 

TTGAGAGAAG AAATCGAGGA CAAGTTTGGT CTTGGTGAGA ATGCTCGTGT TGAAGACATT 2160 

TCCTTTGAGT CGTTGAAGTC ATGTGAATAC TTGAAGGCTG TTCTCAACGA AACTTTGAGA 2220 

TTGTACCCAT CCGTGCCACA GAATTTCAGA GTTGCCACCA AAAACACTAC CCTTCCAAGG 2280 

GGAGGTGGTA AGGACGGGTT ATCTCCTGTT TTGGTCAGAA AGGGTCAAAC CGTTATGTAC 2340 

GGTGTCTACG CTGCCCACAG AAACCCAGCT GTCTACGGTA AGGACGCCCT TGAGTTTAGA 2400 

CCAGAGAGGT GGTTTGAGCC AGAGACAAAG AAGCTTGGCT GGGCCTTCCT TCCATTCAAC 2460 

GGTGGTCCAA GAATTTGCTT GGGACAGCAG TTTGCCTTGA CAGAAGCTTC GTATGTCACT 2520 

GTCAGATTGC TCCAAGAGTT TGGACACTTG TCTATGGACC CCAACACCGA ATATCCACCT 2580 

AGGAAAATGT CGCATTTGAC CATGTCCCTT TTCGACGGTG CCAACATTGA GATGTATTAG 2640 

AGGATCATGT GTTATTTTTG ATTGGTTTAG TCTGTTTGTA GCTATTGATT AGGTTAATTC 2700 

ACGGATTGTT ATTTATTGAT AGGGGGTGCG TGTGTGTGTG TGTGTTGCAT TCACATGGGA 2760 

TCGTTCCAGG TTGTTGTTTC CTTCCATCCT GTTGAGTCAA AAGGAGTTTT GTTTTGTAAC 2820 

TCCGGACGAT GTCTTAGATA GAAGGTCGAT CTCCATGTGA TTGTTTGACT GCTACTCTGA 2880 

TTATGTAATC TGTAAAGCCT AGACGTTATG CAAGCATGTG ATTGTGGTTT TTGCAACCTG 2940 

TTTGCACGAC AAATGATCGA CAGTCGATTA CGTAATCCAT ATTATTTAGA GGGGTAATAA 3000 

AAAATAAATG GCAGCCAGAA TTTCAAACAT TTTGCAAACA ATGCAAAAGA TGAGAAACTC 3060 

CAACAGAAAA AATAAAAAAA CTCCGCAGCA CTCCGAACCA ACAAAACAAT GGGGGGCGCC 3120 

AGAATTATTG ACTATTGTGA CTTTTTTTTA TTTTTTCCGT TAACTTTCAT TGCAGTGAAG 3180 

TGTGTTACAC GGGGTGGTGA TGGTGTTGGT TTCTACAATG CAAGGGCACA GTTGAAGGTT 3240 

TCCACATAAC GTTGCACCAT ATCAACTCAA TTTATCCTCA TTCATGTGAT AAAAGAAGAG 3300 
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CCAAAAGGTA ATTGGCAGAC CCCCCAAGGG GAACACGGAG TAGAAAGCAA TGGAAACACG 3360 

CCCATGACAG TGCCATTTAG CCCACAACAC ATCTAGTATT CTTTTTTTTT TTTGTGCGCA 3420 

GGTGCACACC TGGACTTTAG TTATTGCCCC ATAAAGTTAA CAATCTCACC TTTGGCTCTC 3480 

CCAGTGTCTC CGCCTCCAGA TGCTCGTTTT ACACCCTCGA GCTAACGACA ACACAACACC 3540 

CATGAGGGGA ATGGGCAAAG TTAAACACTT TTGGTTTCAA TGATTCCTAT TTGCTACTCT 3600 

CTTGTTTTGT GTTTTGATTT GCACCATGTG AAATAAACGA CAATTATATA TACCTTTTCG 3660 

TCTGTCCTCC AATGTCTCTT TTTGCTGCCA TTTTGCTTTT TGCTTTTTGC TTTTGCACTC 3720 

TCTCCCACTC CCACAATCAG TGCAGCAACA CACAA 3755 

(2) INFORMATION FOR SEQ ID NO: 88: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3900 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

GACATCATAA TGACCCGGTT ATTTCGCCCT CAGGTTGCTT ATTTGAGCCG TAAAGTGCAG 60 

TAGAAACTTT GCCTTGGGTT CAAACTCTAG TATAATGGTG ATAACTGGTT GCACTCTTGC 120 

CATAGGCATG AAAATAGGCC GTTATAGTAC TATATTTAAT AAGCGTAGGA GTATAGGATG 180 

CATATGACCG GTTTTTCTAT ATTTTTAAGA TAATCTCTAG TAAATTTTGT ATTCTCAGTA 240 

GGATTTCATC AAATTTCGCA ACCAATTCTG GCGAAAAAAT GATTCTTTTA CGTCAAAAGC 300 

TGAATAGTGC AGTTTAAAGC ACCTAAAATC ACATATACAG CCTCTAGATA CGACAGAGAA 360 

GCTCTTTATG ATCTGAAGAA GCATTAGAAT AGCTACTATG AGCCACTATT GGTGTATATA 420 

TTAGGGATTG GTGCAATTAA GTACGTACTA ATAAACAGAA GAAAATACTT AACCAATTTC 480 

TGGTGTATAC TTAGTGGTGA GGGACCTTTT CTGAACATTC GGGTCAAACT TTTTTTTGGA 540 

GTGCGACATC GATTTTTCGT TTGTGTAATA ATAGTGAACC TTTGTGTAAT AAATCTTCAT 600 

GCAAGACTTG CATAATTCGA GCTTGGGAGT TCACGCCAAT TTGACCTCGT TCATGTGATA 660 

AAAGAAAAGC CAAAAGGTAA TTAGCAGACG CAATGGGAAC ATGGAGTGGA AAGCAATGGA 720 

AGCACGCCCA GGACGGAGTA ATTTAGTCCA CACTACATCT GGGGGTTTTT TTTTTGTGCG 780 

CAAGTACACA CCTGGACTTT AGTTTTTGCC CCATAAAGTT AACAATCTAA CCTTTGGCTC 840 

TCCAACTCTC TCCGCCCCCA AATATTCGTT TTTACACCCT CAAGCTAGCG ACAGCACAAC 900 

ACCCATTAGA GGAATGGGGC AAAGTTAAAC ACTTTTGGCT TCAATGATTC CTATTCGCTA 960 

CTACATTCTT CTCTTGTTTT GTGCTTTGAA TTGCACCATG TGAAATAAAC GACAATTATA 1020 

TATACCTTTT CATCCCTCCT CCTATATCTC TTTTTGCTAC ATTTTGTTTT TTACGTTTCT 1080 

TGCTTTTGCA CTCTCCCACT CCCACAAAGA AAAAAAAACT ACACTATGTC GTCTTCTCCA 1140 

TCGTTTGCCC AAGAGGTTCT CGCTACCACT AGTCCTTACA TCGAGTACTT TCTTGACAAC 1200 

TACACCAGAT GGTACTACTT CATACCTTTG GTGCTTCTTT CGTTGAACTT TATAAGTTTG 1260 

CTCCACACAA GGTACTTGOA ACGCAGGTTC CACGCCAAGC CACTCGGTAA CTTTGTCAGG 1320 

GACCCTACGT TTGGTATCGC TACTCCGTTG CTTTTGATCT ACTTGAAGTC GAAAGGTACG 1380 

GTCATGAAGT TTGCTTGGGG CCTCTGGAAC AACAAGTACA TCGTCAGAGA CCCAAAGTAC 1440 

AAGACAACTG GGCTCAGGAT TGTTGGCCTC CCATTGATTG AAACCATGGA CCCAGAOAAC 1500 

ATCAAGGCTG TTTTGGCTAC TCAGTTCAAT GATTTCTCTT TGGGAACCAG ACACGATTTC 1560 

TTGTACTCCT TGTTGGGTGA CGGTATTTTC ACCTTGGACG GTGCTGGCTG GAAACATAGT 1620 

AGAACTATGT TGAGACCACA GTTTGCTAGA GAACAGGTTT CTCACGTCAA GTTGTTGGAG 1680 

CCACACGTTC AGGTGTTCTT CAAGCACGTT AGAAAGCACC GCGGTCAAAC GTTCGACATC 1740 

CAAGAATTGT TCTTCAGGTT GACCGTCGAC TCCGCCACCG AGTTCTTGTT TGGTGAGTCT 1800 

GCTGAATCCT TGAGGGACGA ATCTATTGGA TTGACCCCAA CCACCAAGGA TTTCGATGGC 1860 

AGAAGAGATT TCGCTGACGC TTTCAACTAT TCGCAGACTT ACCAGGCCTA CAGATTTTTG 1920 

TTGCAACAAA TGTACTGGAT CTTGAATGGC TCGGAATTCA GAAAGTCGAT TGCTGTCGTG 1980 

CACAAGTTTG CTGACCACTA TGTGCAAAAG GCTTTGGAGT TGACCGACGA TGACTTGCAG 2040 

AAACAAGACG GCTATGTGTT CTTGTACGAG TTGGCTAAGC AAACCAGAGA CCCAAAGGTC 2100 

TTGAGAGACC AGTTATTGAA CATTTTGGTT GCCGGTAGAG ACACGACCGC CGGTTTGTTG 2160 

TCATTTGTTT TCTACGAGTT GTCAAGAAAC CCTGAGGTGT TTGCTAAGTT GAGAGAGGAG 2220 

GTGGAAAACA GATTTGGACT CGGTGAAGAA GCTCGTGTTG AAGAGATCTC GTTTGAGTCC 2280 

TTGAAGTCTT GTGAGTACTT GAAGGCTGTC ATCAATGAAA CCTTGAGATT GTACCCATCG 2340 
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GTTCCACACA ACTTTAGAGT TGCTACCAGA AACACTACCC TCCCAAGAGG TGGTGGTGAA 2400 

GATGGATACT CGCCAATTGT CGTCAAGAAG GGTCAAGTTG TCATGTACAC TGTTATTGCT 2460 

ACCCACAGAG ACCCAAGTAT CTACGGTGCC GACGCTGACG TCTTCAGACC AGAAAGATGG 2520 

TTTGAACCAG AAACTAGAAA GTTGGGCTGG GCATACGTTC CATTCAATGG TGGTCCAAGA 2580 

ATCTGTTTGG GTCAACAGTT TGCCTTGACC GAAGCTTCAT ACGTCACTGT CAGATTGCTC 2640 

CAGGAGTTTG CACACTTGTC TATGGACCCA GACACCGAAT ATCCACCAAA ATTGCAGAAC 2700 

ACCTTGACCT TGTCGCTCTT TGATGGTGCT GATGTTAGAA TGTACTAAGG TTGCTTTTCC 2760 

TTGCTAATTT TCTTCTGTAT AGCTTGTGTA TTTAAATTGA ATCGGCAATT GATTTTTCTG 2820 

ATACCAATAA CCGTAGTGCG ATTTGACCAA AACCGTTCAA ACTTTTTGTT CTCTCGTTGA 2880 

CGTGCTCGCT CATCAGCACT GTTTGAAGAC GAAAGAGAAA ATTTTTTGTA AACAACACTG 2940 

TCCAAATTTA CCCAACGTGA ACCATTATGC AAATGAGCGG CCCTTTCAAC TGGTCGCTGG 3000 

AAGCATTCGG GGATATCTAC AACGCCCTTA AGTTTGAAAC AGACATTGAT TTAGACACCA 3060 

TAGATTTCAG CGGCATCAAG AATGACCTTG CCCACATTTT GACGACCCCA ACACCACTGG 3120 

AAGAATCACG CCAGAAACTA GGCGATGGAT CCAAGCCTGT GACCTTGCCC AATGGAGACG 3180 

AAGTGGAGTT GAACCAAGCG TTCCTAGAAG TTACCACATT ATTGTCGAAT GAGTTTGACT 3240 

TGGACCAATT GAACGCGGCA GAGTTGTTAT ACTACGCTGG CGACATATCC TACAAGAAGG 3300 

GCACATCAAT CGCAGACAGT GCCAGATTGT CTTATTATTT GAGAGCAAAC TACATCTTGA 3360 

ACATACTTGG GTATTTGATT TCGAAGCAGC GATTGGATTT GATAGTCACG GACAACGACG 3420 

CGTTGTTTGA TAGTATTTTG AAAAGTTTTG AAAAGATCTA CAAGTTGATA AGCGTGTTGA 3480 

ACGATATGAT TGACAAGCAA AAGGTGACAA GCGACATCAA CAGTCTAGCA TTCATCAATT 3540 

GCATCAACTA CTCGAGAGGT CAACTATTCT CCGCACACGA ACTTTTGGGA CTGGTTTTGT 3600 

TTGGATTGGT CGACATCTAT TTCAACCAGT TTGGCACATT AGACAACTAC AAGAAGGTAT 3660 

TGGCATTGAT ACTGAAGAAC ATCAGCGATG AAGACATCTT GATCATACAC TTCCTCCCAT 3720 

CGACACTACA ATTGTTTAAG CTGGTGTTGG ACAAGAAAGA CGACGCTGCA GTTGAACAGT 3780 

TCTACAAGTA CATCACTTCA ACAGTGTCAC GAGACTACAA CTCCAACATC GGCTCCACAG 3840 

CCAAAGATGA TATCGATTTG TCCAAAACCA AACTCAGTGG CTTTGAGGTG TTGACGAGTT 3900 

(2) INFORMATION FOR SEQ ID NO: 89: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3666 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE5S ; single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

CCTGCAGAAT TCGCGGCCGC GTCGACAGAG TAGCAGTTAT GCAAGCATGT GATTGTGGTT 60 

TTTGCAACCT GTTTGCACGA CAAATGATCG ACAGTCGATT ACGTAATCCA TATTATTTAG 120 

AGGGGTAATA AAAAATAAAT GGCAGCCAGA ATTTCAAACA TTTTGCAAAC AATGCAAAAG 180 

ATGAGAAACT CCAACAGAAA AAATAAAAAA ACTCCGCAGC ACTCCGAACC AACAAAACAA 240 

TGGGGGGCGC CAGAATTATT GACTATTGTG ACTTTTTTTT ATTTTTTCCG TTAACTTTCA 300 

TTGCAGTGAA GTGTGTTACA CGGGGTGGTG ATGGTGTTGG TTTCTACAAT GCAAGGGCAC 360 

AGTTGAAGGT TTCCACATAA CGTTGCACCA TATCAACTCA ATTTATCCTC ATTCATGTGA 420 

TAAAAGAAGA GCCAAAAGGT AATTGGCAGA CCCCCCAAGG GGAACACGGA GTAGAAAGCA 480 

ATGGAAACAC GCCCATGACA GTGCCATTTA GCCCACAACA CATCTAGTAT TCTTTTTTTT 540 

TTTTGTGCGC AGGTGCACAC CTGGACTTTA GTTATTGCCC CATAAAGTTA ACAATCTCAC 600 

CTTTGGCTCT CCCAGTGTCT CCGCCTCCAG ATGCTCGTTT TACACCCTCG AGCTAACGAC 660 

AACACAACAC CCATGAGGGG AATGGGCAAA GTTAAACACT TTTGGTTTCA ATGATTCCTA 720 

TTTGCTACTC TCTTGTTTTG TGTTTTGATT TGCACCATGT GAAATAAACG ACAATTATAT 780 

ATAC C TT T TC GTCTGTCCTC CAATGTCTCT TTTTGCTGCC ATTTTGCTTT TTGCTTTTTG 840 

CTTTTGCACT CTCTCCCACT CCCACAATCA GTGCAGCAAC ACACAAAGAA GAAAAATAAA 900 

AAAACCTACA CTATGTCGTC TTCTCCATCG TTTGCTCAGG AGGTTCTCGC TACCACTAGT 960 

CCTTACATCG AGTACTTTCT TGACAACTAC ACCAGATGGT ACTACTTCAT CCCTTTGGTG 1020 

CTTCTTTCGT TGAACTTCAT CAGCTTGCTC CACACAAAGT ACTTGGAACG CAGGTTCCAC 1080 

GCCAAGCCGC TCGGTAACGT CGTGTTGGAT CCTACGTTTG GTATCGCTAC TCCGTTGATC 1140 

TTGATCTACT TAAAGTCGAA AGGTACAGTC ATGAAGTTTG CCTGGAGCTT CTGGAACAAC 1200 

AAGTACATTG TCAAAGACCC AAAGTACAAG ACCACTGGCC TTAGAATTGT CGGCCTCCCA 1260 
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TTGATTGAAA 
TTCTCCTTGQ 
TTGGACGGTG 
CAGGTTTCCC 
AAACACCGCG 
GCCACCGAGT 
ACCCCAACCA 
CAGACTTACC 
GAATTCAGAA 
TTGGAGTTGA 
GCTAAGCAAA 
GGTAGAGACA 
GAAGTGTTTG 
CGTGTTGAAG 
AATGAAGCCT 
ACTACCCTTC 
CAAGTTGTCA 
GCCGACGTCT 
TATGTTCCAT 
GCTTCATACG 
GCTGAGTACC 
GTTAGAATGT 
TTGAATCGGC 
TCAAACTTTT 
GAAAACGAAA 
ATTATAACCA 
ACACCCTTAA 
ATGACCTTGT 
GCGATGGATC 
TCCTAGAAGT 
AGTTGTTATA 
CCAGATTGTC 
CGAAGCAGCG 
AAAGTTTTGA 
AGG7GACAAG 
AACTATTCTC 
TCAACCAGTT 
TCAGTGATGA 
TGGTGTTGGA 
CAGTGTCGCA 
CCAAAGCC 



CCATAGACCC 
GAACTAGACA 
CTGGCTGGAA 
ACGTCAAGTT 
GTCAGACTTT 
TCTTGTTTGG 
CCAAGGATTT 
AGGCCTACAG 
AGTCGATTGC 
CCGACGATGA 
CTAGAGACCC 
CGACCGCCGG 
CCAAGTTGAG 
AGATCTCTTT 
TGAGATTGTA 
CAAGAGGCGG 
TGTACACTGT 
TCAGACCAGA 
TCAATGGTGG 
TCACTGTCAG 
CACCAAAATT 
TCTAAGGTTG 
GATTGATTTT 

rriTrrrriT 

AAAGAAAATT 
AATGAGCGGC 
GTTTGAGGAA 
CCACATTTTG 
CAAGCCTGTG 
TACCACATTA 
CTACGCCGGC 
TTACTATTTG 
ATTGGATGTG 
AAAGATCTAC 
CGACATCAAC 
CGCACACGAA 
TGGCTCATTA 
AGATATCTTG 
TAAGAAAGAC 
AGACTACAAC 



AGAGAACATC 
CGATTTCTTG 
ACACAGTAGA 
GTTGGAACCA 
TGACATCCAA 
TGAGTCTGCT 
CGAAGGCAGA 
ATTTTTGTTG 
CATCGTGCAC 
CTTGCAGAAA 
AAAGGTCTTG 
TTTGTTGTCG 
AGAGGAGGTG 
TGAGTCCTTG 
CCCATCTGTT 
TGGTAAAGAC 
CATTGGTACC 
AAGATGGTTC 
TCCAAGAATC 
ATTGCTCCAA 
GCAGAACACC 
CTTATCCTTG 
TCTGGTACTA 
TCTTCCCCCT 
TTTTGTAAAC 
GCTTTCAACT 
GACATTGATT 
ACAACCCCAA 
GCCTTGCCCA 
TTGTCGAACG 
GACATATCCT 
AGAGCAAACT 
ATAGTCACCG 
AAGTTGATAA 
AGTCTAGCAT 
CTTTTGGGAC 
GACAACTACA 
ATCGTACGCT 
GACGCCACTG 
TCCAACATCG 



AAAGCTGTGT 
TACTCCTTGT 
ACTATGTTGA 
CACGTTCAGG 
GAATTGTTCT 
GAATCCTTGA 
GGAGATTTCG 
CAACAAATGT 
AAGTTTGCTG 
CAAGACGGCT 
AGAGACCAGT 
TTTGTGTTCT 
GAAAACAGAT 
AAGTCCTGTG 
CCACACAACT 
GGATGCTCGC 
CACAGAGACC 
GAGCCAGAAA 
TGTTTGGGTC 
GAGTTTGGAA 
TTGACCTTGT 
CTAGTGTTAT 
ATAACTGTAG 
ACCTTCGTTG 
AACATTGCCC 
GGTCACTGGA 
TAGACACCAT 
CACCACTGGA 
ATGGAGACGA 
AGTTTGACTT 
ACAAGAAGGG 
ACATCTTGAA 
ACAACAACGC 
GCGCGTTGAA 
TTATCAACTG 
TGGTTTTGTT 
AGAAAGTATT 
TCCTCCCATC 

TTGACCAGTT 
GAGCCACAGC 



TGGCTACTCA 
TGGGCGATGG 
GACCACAGTT 
TGTTCTTCAA 
TCAGATTGAC 
GAGACGACTC 
CTGACGCTTT 
ACTGGATTTT 
ACCACTATGT 
ATGTGTTCTT 
TGTTGAACAT 
ACGAGTTGTC 
TTGGACTCGG 
AGTACTTGAA 
TCAGAGTTGC 
CAATTGTTGT 
CAAGTATCTA 
CTAGAAAGTT 
AGCAGTTTGC 
ACTTGTCCCT 
CACTCTTTGA 
TTATAGTTTG 
TGGGTTTTGA 
CTCGCTCATC 
AAACTTACCC 
GGCATTCGGG 
AGATTTCAGC 
AGAATCGCGC 
AGTGGAGTTG 
GGACCAATTG 
CACATCAATT 
CATACTTGGG 
GTTGTTTGAT 
CGATATGATT 
CATCAACTAC 
TGGATTGGTT 
GGCATTGATA 
GACACTACAA 
CTACAAGTAC 
CAAAGATGAT 



GTTCAACGAT 
TATTTTTACC 
TGCTAGAGAA 
GCACGTTAGA 
CGTCGACTCC 
TGTTGGTTTG 
CAACTACTCG 
GAATGGCGCG 
GCAAAAGGCT 
GTACGAGTTG 
TTTGGTTGCC 
GAGAAACCCT 
CGAAGAGGCT 
GGCTGTCATC 
CACCAGAAAC 
CAAGAAGGGT 
CGGTGCCGAC 
GGGCTGGGCA 
CTTGACTGAA 
GGATCCAAAC 
TGGTGCTGAC 
TGTATTTAAA 
CCAAAACCGT 
AGCACTGTTT 
AACGTGAACC 
GATATCTACA 
GGCATCAAGA 
CAGAAACTAG 
AACCAAGCGT 
AACGCGGCCG 
GCCGACAGTG 
TACTTTATTT 
AATATTTTGA 
GACAAGCAAA 
TCGAGGGGTC 
GACAACTATT 
CTGAAGAACA 
TTGTTTAAGC 
ATCACCTCAA 
ATCGATTTGT 



(2) INFORMATION FOR SEQ ID NO: 90: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3826 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 
TGGAGTCGCC AGACTTGCTC ACTTTTGACT CCCTTCGAAA 
TGCTCAACGA AACGCTCCGT ATCTACCCGG GGGTACCACG 
GCAACACGAC GTTGCCACGC GGAGGAGGCA AAGACGGCAA 
AGGGACAGTC CGTTGGGTTG ATTACTATTG CCACGCAGAC 
CCGACGCTGG TGAGTTTAAG CCGGAGAGAT GGTTTGATTC 
GTAAATACTT GCCGTTCAAT GCTGGGCCAC GGACTTGCTT 



1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3668 



CTCAAAGTAC 
AAACATGAAG 
GGAACCTATC 
GGACCCAGAG 
AAGCATGAAG 
GGGGCAGCAG 



GTTCAGGCGG 
ACAGCTACGT 
TTGGTGCAGA 
TATTTTGGGG 
AACTTGGGGT 
TACACTTTGA 



60 
120 
180 
240 
300 
360 
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TTGAAGCGAG CTACTTGCTA GTCCGGTTGG CCCAGACCTA CCGGGCAATA GATTTGCAGC 420 

CAGGATCGGC GTACCCACCA AGAAAGAAGT CGTTGATCAA CATGAGTGCT GCCGACGGGG 480 

TGTTTGTAAA GCTTTATAAG GATGTAACGG TAGATGGATA GTTGTGTAGG AGGAGCGGAG 540 

ATAAATTAGA TTTGATTTTG TGTAAGGTTT TGGATGTCAA CCTACTCCGC ACTTCATGCA 600 

GTGTGTGTGA CACAAGGGTG TACTACGTGT GCGTGTGCGC CAAGAGACAG CCCAAGGGGG 660 

TGGTAGTGTG TGTTGGCGGA AGTGCATGTG ACACAACGCG TGGGTTCTGG CCAATGGTGG 720 

ACTAAGTGCA GGTAAGCAGC GACCTGAAAC ATTCCTCAAC GCTTAAGACA CTGGTGGTAG 780 

AGATGCGGAC CAGGCTATTC TTGTCGTGCT ACCCGGCGCA TGGAAAATCA ACTGCGGGAA 840 

GAATAAATTT ATCCGTAGAA TCCACAGAGC GGATAAATTT GCCCACCTCC ATCATCAACC 900 

ACGCCGCCAC TAACTACATC ACTCCCCTAT TTTCTCTCTC TCTCTTTGTC TTACTCCGCT 960 

CCCGTTTCCT TAGCCACAGA TACACACCCA CTGCAAACAG CAGCAACAAT TATAAAGATA 1020 

CGCCAGGCCC ACCTTCTTTC TTTTTCTTCA CTTTTTTGAC TGCAACTTTC TACAATCCAC 1080 

CACAGCCACC ACCACAGCCG CTATGATTGA ACAACTCCTA GAATATTGGT ATGTCGTTGT 1140 

GCCAGTGTTG TACATCATCA AACAACTCCT TGCATACACA AAGACTCGCG TCTTGATGAA 1200 

AAAGTTGGGT GCTGCTCCAG TCACAAACAA GTTGTACGAC AACGCTTTCG GTATCGTCAA 1260 

TGGATGGAAG GCTCTCCAGT TCAAGAAAGA GGGCAGGGCT CAAGAGTACA ACGATTACAA 1320 

GTTTGACCAC TCCAAGAACC CAAGCGTGGG CACCTACGTC AGTATTCTTT TCGGCACCAG 1380 

GATCGTCGTG ACCAAAGATC CAGAGAATAT CAAAGCTATT TTGGCAACCC AGTTTGGTGA 1440 

TTTTTCTTTG GGCAAGAGGC ACACTCTTTT TAAGCCTTTG TTAGGTGATG GGATCTTCAC 1500 

ATTGGACGGC GAAGGCTGGA AGCACAGCAG AGCCATGTTG AGACCACAGT TTGCCAGAGA 1560 

ACAAGTTGCT CATGTGACGT CGTTGGAACC ACACTTCCAG TTGTTGAAGA AGCATATTCT 1620 

TAAGCACAAG GGTGAATACT TTGATATCCA GGAATTGTTC TTTAGATTTA CCGTTGATTC 1680 

GGCCACGGAG TTCTTATTTG GTGAGTCCGT GCACTCCTTA AAGGACGAAT CTATTGGTAT 1740 

CAACCAAGAC GATATAGATT TTGCTGGTAG AAAGGACTTT GCTGAGTCGT TCAACAAAGC 1800 

CCAGGAATAC TTGGCTATTA GAACCTTGGT GCAGACGTTC TACTGGTTGG TCAACAACAA 1860 

GGAGTTTAGA GACTGTACCA AGCTGGTGCA CAAGTTCACC AACTACTATG TTCAGAAAGC 1920 

TTTGGATGCT AGCCCAGAAG AGCTTGAAAA GCAAAGTGGG TATGTGTTCT TGTACGAGCT 1980 

TGTCAAGCAG ACAAGAGACC CCAATGTGTT GCGTGACCAG TCTTTGAACA TCTTGTTGGC 2040 

CGGAAGAGAC ACCACTGCTG GGTTGTTGTC GTTTGCTGTC TTTGAGTTGG CCAGACACCC 2100 

AGAGATCTGG GCCAAGTTGA GAGAGGAAAT TGAACAACAG TTTGGTCTTG GAGAAGACTC 2160 

TCGTGTTGAA GAGATTACCT TTGAGAGCTT GAAGAGATGT GAGTACTTGA AAGCGTTCCT 2220 

TAATGAAACC TTGCGTATTT ACCCAAGTGT CCCAAGAAAC TTCAGAATCG CCACCAAGAA 2280 

CACGACATTG CCAAGGGGCG GTGGTTCAGA CGGTACCTCG CCAATCTTGA TCCAAAAGGG 2340 

AGAAGCTGTG TCGTATGGTA TCAACTCTAC TCATTTGGAC CCTGTCTATT ACGGCCCTGA 2400 

TGCTGCTGAG TTCAGACCAG AGAGATGGTT TGAGCCATCA ACCAAAAAGC TCGGCTGGGC 2460 

TTACTTGCCA TTCAACGGTG GTCCAAGAAT CTGTTTGGGT CAGCAGTTTG CCTTGACGGA 2520 

AGCTGGCTAT GTGTTGGTTA GATTGGTGCA AGAGTTCTCC CACGTTAGGC TGGACCCAGA 2580 

CGAGGTGTAC CCGCCAAAGA GGTTGACCAA CTTGACCATG TGTTTGCAGG ATGGTGCTAT 2640 

TGTCAAGTTT GACTAGCGGC GTGGTGAATG CGTTTGATTT TGTAGTTTCT GTTTGCAGTA 2700 

ATGAGATAAC TATTCAGATA AGGCGAGTGG ATGTACGTTT TGTAAGAGTT TCCTTACAAC 2760 

CTTGGTGGGG TGTGTGAGGT TGAGGTTGCA TCTTGGGGAG ATTACACCTT TTGCAGCTCT 2820 

CCGTATACAC TTGTACTCTT TGTAACCTCT ATCAATCATG TGGGGGGGGG GGTTCATTGT 2880 

TTGGCCATGG TGGTGCATGT TAAATCCGCC AACTACCCAA TCTCACATGA AACTCAAGCA 2940 

CACTAAAAAA AAAAAAGATG TTGGGGGAAA ACTTTGGTTT CCCTTCTTAG TAATTAAACA 3000 

CTCTCACTCT CACTCTCACT CTCTCCACTC AGACAAACCA ACCACCTGGG CTGCAGACAA 3060 

CCAGAAAAAA AAAGAACAAA ATCCAGATAG AAAAACAAAG GGCTGGACAA CCATAAATAA 3120 

ACAATCTAGG GTCTACTCCA TCTTCCACTG TTTCTTCTTC TTCAGACTTA GCTAACAAAC 3180 

AACTCACTTC ACCATGGATT ACGCAGGCAT CACGCGTGGC TCCATCAGAG GCGAGGCCTT 3240 

GAAGAAACTC GCAGAATTGA CCATCCAGAA CCAGCCATCC AGCTTGAAAG AAATCAACAC 3300 

CGGCATCCAG AAGGACGACT TTGCCAAGTT GTTGTCTGCC ACCCCGAAAA TCCCCACCAA 3360 

GCACAAGTTG AACGGCAACC ACGAATTGTC TGAGGTCGCC ATTGCCAAAA AGGAGTACGA 3420 

GGTGTTGATT GCCTTGAGCG ACGCCACAAA AGACCCAATC AAAGTGACCT CCCAGATCAA 3480 

GATCTTGATT GACAAGTTCA AGGTGTACTT GTTTGAGTTG CCTGACCAGA AGTTCTCCTA 3540 

CTCCATCGTG TCCAACTCCG TCAACATCGC CCCCTGGACC TTGCTCGGGG AGAAGTTGAC 3600 

CACGGGCTTG ATCAACTTGG CCTTCCAGAA CAACAAGCAG CACTTGGACG AGGTCATTGA 3660 

CATCTTCAAC GAGTTCATCG ACAAGTTCTT TGGCAACACG GAGCCGCAAT TGACCAACTT 3720 
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CTTGACCTTG TGCGGTGTGT TGGACGGGTT GATTGACCAT GCCAACTTCT TGAGCGTGTC 3780 

CTCGCGGACC TTCAAGATCT TCTTGAACTT GGACTCGTAT GTGGAC 3826 

(2) INFORMATION FOR SEQ ID NO: 91: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 3910 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

TTACAATCAT GGAGCTCGCT AGGAACCCAG ATGTCTGGGA GAAGCTCCGC GAAGAGGTCA 60 

ACACGAACTT TGGCATGGAG TCGCCAGACT TGCTCACTTT TGACTCTCTT AGAAGCTCAA 120 

AGTACGTTCA GGCGGTGCTC AACGAAACGC TTCGTATCTA CCCGGGGGTG CCACGAAACA 180 

TGAAGACAGC TACGTGCAAC ACGACGTTGC CGCGTGGAGG AGGCAAAGAC GGTAAGGAAC 240 

CTATTTTGGT GCAGAAGGGC CAGTCCGTTG GGTTGATTAC TATTGCCACG CAGACGGACC 300 

CAGAGTATTT TGGGGCAGAT GCTGGTGAGT TCAAACCGGA GAGATGGTTT GATTCAAGCA 360 

TGAAGAACTT GGGGTGTAAG TACTTGCCGT TCAATGCTGG GCCCCGGACT TGTTTGGGGC 420 

AGCAGTACAC TTTGATTGAA GCGAGCTATT TGCTAGTCAG GTTGGCGCAG ACCTACCGGG 480 

TAATCGATTT GCTGCCAGGG TCGGCGTACC CACCAAGAAA GAAGTCGTTG ATCAATATGA 540 

GTGCTGCCGA TGGGGTGGTT GTAAAGTTTC ACAAGGATCT AGATGGATAT GTAAGGTGTG 600 

TAGGAGGAGC GGAGATAAAT TAGATTTGAT TTTGTGTAAG GTTTAGCACG TCAAGCTACT 660 

CCGCACTTTG TGTGTAGGGA GCACATACTC CGTCTGCGCC TGTGCCAAGA GACGGCCCAG 720 

GGGTAGTGTG TGGTGGTGGA AGTGCATGTG ACACAATACC CTGGTTCTGG CCAATTGGGG 780 

ATTTAGTGTA GGTAAGCTGC GACCTGAAAC ACTCCTCAAC GCTTGAGACA CTGGTGGGTA 840 

GAGATGCGGG CCAGGAGGCT ATTCTTGTCG TGCTACCCGT GCACGGAAAA TCGATTGAGG 900 

GAAGAACAAA TTTATCCGTG AAATCCACAG AGCGGATAAA TTTGTCACAT TGCTGCGTTG 960 

CCCACCCACA GCATTCTCTT TTCTCTCTCT TTGTCTTACT CCGCTCCTGT TTCCTTATCC 1020 

AGAAATACAC ACCAACTCAT ATAAAGATAC GCTAGCCCAG CTGTCTTTCT TTTTCTTCAC 1080 

TTTTTTTGGT GTGTTGCTTT TTTGGCTGCT ACTTTCTACA ACCACCACCA CCACCACCAC 1140 

CATGATTGAA CAAATCCTAG AATATTGGTA TATTGTTGTG CCTGTGTTGT ACATCATCAA 1200 

ACAACTCATT GCCTACAGCA AGACTCGCGT CTTGATGAAA CAGTTGGGTG CTGCTCCAAT 1260 

CACAAACCAG TTGTACGACA ACGTTTTCGG TATCGTCAAC GGATGGAAGG CTCTCCAGTT 1320 

CAAGAAAGAG GGCAGAGCTC AAGAGTACAA CGATCACAAG TTTGACAGCT CCAAGAACCC 1380 

AAGCGTCGGC ACCTATGTCA GTATTCTTTT TGGCACCAAG ATTGTCGTGA CCAAGGATCC 1440 

AGAGAATATC AAAGCTATTT TGGCAACCCA GTTTGGCGAT TTTTCTTTGG GCAAGAGACA 1500 

CGCTCTTTTT AAACCTTTGT TAGGTGATGG GATCTTCACC TTGGACGGCG AAGGCTGGAA 1560 

GCATAGCAGA TCCATGTTAA GACCACAGTT TGCCAGAGAA CAAGTTGCTC ATGTGACGTC 1620 

GTTGGAACCA CACTTCCAGT TGTTGAAGAA GCATATCCTT AAACACAAGG GTGAGTACTT 1680 

TGATATCCAG GAATTGTTCT TTAGATTTAC TGTCGACTCG GCCACGGAGT TCTTATTTGG 1740 

TGAGTCCGTG CACTCCTTAA AGGACGAAAC TATCGGTATC AACCAAGACG ATATAGATTT 1800 

TGCTGGTAGA AAGGACTTTG CTGAGTCGTT CAACAAAGCC CAGGAGTATT TGTCTATTAG 1860 

AATTTTGGTG CAGACCTTCT ACTGGTTGAT CAACAACAAG GAGTTTAGAG ACTGTACCAA 1920 

GCTGGTGCAC AAGTTTACCA ACTACTATGT TCAGAAAGCT TTGGATGCTA CCCCAGAGGA 1980 

ACTTGAAAAG CAAGGCGGGT ATGTGTTCTT GTATGAGCTT GTCAAGCAGA CGAGAGACCC 2040 

CAAGGTGTTG CGTGACCAGT CTTTGAACAT CTTGTTGGCA GGAAGAGACA CCACTGCTGG 2100 

GTTGTTGTCC TTTGCTGTGT TTGAGTTGGC CAGAAACCCA CACATCTGGG CCAAGTTGAG 2160 

AGAGGAAATT GAACAGCAGT TTGGTCTTGG AGAAGACTCT CGTGTTGAAG AGATTACCTT 2220 

TGAGAGCTTG AAGAGATGTG AGTACTTGAA AGCGTTCCTT AACGAAACCT TGCGTGTTTA 2280 

CCCAAGTGTC CCAAGAAACT TCAGAATCGC CACCAAGAAT ACAACATTGC CAAGGGGTGG 2340 

TGGTCCAGAC GGTACCCAGC CAATCTTGAT CCAAAAGGGA GAAGGTGTGT CGTATGGTAT 2400 

CAACTCTACC CACTTAGATC CTGTCTATTA TGGCCCTGAT GCTGCTGAGT TCAGACCAGA 2460 

GAGATGGTTT GAGCCATCAA CCAGAAAGCT CGGCTGGGCT TACTTGCCAT TCAACGGTGG 2520 

GCCACGAATC TGTTTGGGTC AGCAGTTTGC CTTGACCGAA GCTGGTTACG TTTTGGTCAG 2580 

ATTGGTGCAA GAGTTCTCCC ACATTAGGCT GGACCCAGAT GAAGTGTATC CACCAAAGAG 2640 

GTTGACCAAC TTGACCATGT GTTTGCAGGA TGGTGCTATT GTCAAGTTTG ACTAGTACGT 2700 
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ATGAGTGCGT TTGATTTTGT AGTTTCTGTT TGCAGTAATG AGATAACTAT TCAGATAAGG 2760 

CGGGTGGATG TACGTTTTGT AAGAGTTTCC TTACAACCCT GGTGGGTGTG TGAGGTTGCA 2820 

TCTTAGGGAG AGATAGCACC TTTTGCAGCT CTCCGTATAC AGTTTTACTC TTTGTAACCT 2880 

ATGCCAATCA TGTGGGGATT CATTGTTTGC CCATGGTGGT GCATGCAAAA TCCCCCCAAC 2540 

TACCCAATCT CACATGAAAC TCAAGCACAC TAGAAAAAAA AGATGTTGCG TGGGTTCTTT 3000 

TGATGTTGGG GAAAACTTTC GTTTCCTTTC TCAGTAATTA AACGTTCTCA CTCAGACAAA 3060 

CCACCTGGGC TGCAGACAAC CAGAAAAAAC AAAATCCAGA TAGAAGAAGA AAGGGCTGGA 3120 

CAACCATAAA TAAACAACCT AGGGTCCACT CCATCTTTCA CTTCTTCTTC TTCAGACTTA 3180 

TCTAACAAAC GACTCACTTC ACCATGGATT ACGCAGGTAT CACGCGTGGG TCCATCAGAG 3240 

GCGAAGCCTT GAAGAAACTC GCCGAGTTGA CCATCCAGAA CCAGCCATCC AGCTTGAAAG 3300 

AAATCAACAC CGGCATCCAG AAGGACGACT TTGCCAAGTT GTTGTCTTCC ACCCCGAAAA 3360 

TCCACACCAA GCACAAGTTG AATGGCAACC ACGAATTGTC CGAAGTCGCC ATTGCCAAAA 3420 

AGGAGTACGA GGTGTTGATT GCCTTGAGCG ACGCCACGAA AGAACCAATC AAAGTCACCT 3480 

CCCAOATCAA GATCTTGATT GACAAGTTCA AGGTGTACTT GTTTGAGTTG CCCGACCAGA 3540 

AGTTCTCCTA CTCCATCGTG TCCAACTCCG TTAACATTGC CCCCTGGACC TTGCTCGGTG 3600 

AGAAGTTGAC CACGGGCTTG ATCAACTTGG CGTTCCAGAA CAACAAGCAG CACTTGGACG 3660 

AAGTCATCGA CATCTTCAAC GAGTTCATCG ACAAGTTCTT TGGCAACACA GAGCCGCAAT 3720 

TGACCAACTT CTTGACCTTG TCCGGTGTGT TGGACGGGTT GATTGACCAT GCCAACTTCT 3780 

TGAGCGTGTC CTCCAGGACC TTCAAGATCT TCTTGAACTT GGACTCGTTT GTGGACAACT 3840 

CGGACTTCTT GAACGACGTG GAGAACTACT CCGACTTTTT GTACGACGAG CCGAACGAGT 3900 

ACCAGAACTT 3910 



(2) INFORMATION FOR SEQ ID NO; 92: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3150 base pairs 

(B) TYPE: nucleic acid 

<C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 

GAATTCTTTG GATCTAATTC CAGCTGATCT TGCTAATCCT TATCAACGTA GTTGTGATCA 60 

TTGTTTGTCT GAATTATACA CACCAGTGGA AGAATATGGT CTAATTTGCA CGTCCCACTG 120 

GCATTGTGTG TTTGTGGGGG GGGGGGGGTG CACACATTTT TAGTGCCATT CTTTGTTGAT 180 

TACCCCTCCC CCCTATCATT CATTCCCACA GGATTAGTTT TTTCCTCACT GGAATTCGCT 240 

GTCCACCTGT CAACCCCCCC CCCCCCCCCC CCCACTGCCC TACCCTGCCC TGCCCTGCAC 300 

GTCCTGTGTT TTGTGCTGTG TCTTTCCCAC GCTATAAAAG CCCTGGCGTC CGGCCAAGGT 360 

TTTTCCACCC AGCCAAAAAA ACAGTCTAAA AAATTTGGTT GATCCTTTTT GGTTGCAAGG 420 

TTTTCCACCA CCACTTCCAC CACCTCAACT ATTCGAACAA AAGATGCTCG ATCAGATCTT 480 

ACATTACTGG TACATTGTCT TGCCATTGTT GGCCATTATC AACCAGATCG TGGCTCATGT 540 

CAGGACCAAT TATTTGATGA AGAAATTGGG TGCTAAGCCA TTCACACACG TCCAACGTGA 600 

CGGGTGGTTG GGCTTCAAAT TCGGCCGTGA ATTCCTCAAA GCAAAAAGTG CTGGGAGACT 660 

GGTTGATTTA ATCATCTCCC GTTTCCACGA TAATGAGGAC ACTTTCTCCA GCTATGCTTT 720 

TGGCAACCAT GTGGTGTTCA CCAGGGACCC CGAGAATATC AAGGCGCTTT TGGCAACCCA 780 

GTTTGGTGAT TTTTCATTGG GCAGCAGGGT CAAGTTCTTC AAACCATTAT TGGGGTACGG 840 

TATCTTCACA TTGGACGCCG AAGGCTGGAA GCACAGCAGA GCCATGTTGA GACCACAGTT 900 

TGCCAGAGAA CAAGTTGCTC ATGTGACGTC GTTGGAACCA CACTTCCAGT TGTTGAAGAA 960 

GCATATCCTT AAACACAAGG GTGAGTACTT TGATATCCAG GAATTGTTCT TTAGATTTAC 1020 

TGTCGACTCG GCCACGGAGT TCTTATTTGG TGAGTCCGTG CACTCCTTAA AGGACGAGGA 1080 

AATTGGCTAC GACACGAAAG ACATGTCTGA AGAAAGACGC AGATTTGCCG ACGCGTTCAA 1140 

CAAGTCGCAA GTCTACGTGG CCACCAGAGT TGCTTTACAG AACTTGTACT GGTTGGTCAA 1200 

CAACAAAGAG TTCAAGGAGT GCAATGACAT TGTCCACAAG TTTACCAACT ACTATGTTCA 1260 

GAAAGCCTTG GATGCTACCC CAGAGGAACT TGAAAAGCAA GGCGGGTATG TGTTCTTGTA 1320 

TGAGCTTGTC AAGCAGACGA GAGACCCCAA GGTGTTGCGT GACCAGTCTT TGAACATCTT 1380 

GTTGGCAGGA AGAGACACCA CTGCTGGGTT GTTGTCCTTT GCTGTGTTTG AGTTGGCCAG 1440 

AAACCCACAC ATCTGGGCCA AGTTGAGAGA GGAAATTGAA CAGCAGTTTG GTCTTGOAGA 1500 

AGACTCTCGT GTTGAAGAGA TTACCTTTGA GAGCTTGAAG AGATGTGAGT ACTTGAAGGC 1560 
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CGTGTTGAAC 
TAAAGACACG 
GAAGGATGAG 
CGCCGATGCT 
ATGGGCTTTC 
GACTGAAGCC 
CCCCGAAACC 
TGCACACGTC 
TTTACTTTCT 
ACAATTGTGT 
CCGGAGATAA 
TCTCCGCAGC 
ACTTAACACC 
ACGTTTTTTA 
TGCTTGAAAG 
TATTTGCATC 
TGTTGCACCA 
CACTTTCCGT 
CGGCCTCCTC 
TACAGATAAA 
AAGATGATGG 
CGCTACTTGG 
TTCACGCACA 
GCGAAGAAGA 
ACCTTCTCGA 
AAGGCGGTCT 
AAGCCGTTGT 



GAAACTTTGA 
ACTTTACCAA 
GTGGTGCAGT 
GCTGATTTTA 
TTGCCATTCA 
GGTTACGTTT 
AAGTACCCAC 
AAGATGTCAT 
CTTCATACCA 
CGCACTAGTA 
ATTACAGTTT 
AGCTTTGCCA 
CCTTATCTCT 
TGTTTTGTCT 
AAGTGTCAAA 
AATACGAGGG 
TATCCCTCCT 
TGTTCAATAT 
CCCAAATTAC 
CCTTAAATCT 
AGCAACTCCT 
CTTCCCACGC 
CCCAGTACGA 
TCGGGCGGCA 
GCTACACTTT 
TGGCGACGCA 
TGGGGTATGG 



GATTACACCC 
GAGGCGGTGG 
ACTCCATCTC 
GACCGGAAAG 
ACGGTGGTCC 
TGGTTAGACT 
CACCTAGATT 
AGGTTTCCCC 
AATGGACAAA 
TGTAACAATT 
GGTTTTGTGT 
CGGGTTTGCT 
CCACTCTAGG 
AGACTTTGAT 
ATGTGACAGG 
GCTGACTCTA 
GGAGTTGGTC 
TTCTCCTTCC 
AAGAAAAATA 
GCAAAAACAA 
CCAGTACTGG 
ACGAGCCGTC 
CGGCTGGTAT 
GACGGACTTG 
CGG CATC CAT 
GTTCGATGAC 
GATATTCACG 



AAGTGTCCCA 
CCCCAACGGC 
GGCAACTCAG 
ATGGTTTGAA 
AAGAATCTGT 
TGTTCAGGAG 
GGCACACTTG 
ATACAAGTAG 
AGTTTTAAGC 
GTAAAAATAG 
AAACTCGCGG 
CTGGGGCCAA 
TTGTAGCTCT 
GATTACGTTG 
CGACGCTATT 
GTCTAGGATG 
GACCTCGCCT 
CATTGTTCCA 
AATTGTCGCA 
GACCCCTCCC 
TACATCGCAC 
TACTTGCGCC 
GGGTTCAAGT 
GTGCATGCGC 
ATCATCCTTA 
TTCTCGCTCG 



AGAAACGCAA 
AAGGATCCTA 
ACAAATCCTG 
CCATCAACTA 
TTGGGACAAC 
TTTCCAAACT 
ACGATGTGCT 
TTCAGTAATT 
ATGCCTAACA 
TGTACACTAA 
ATATCTCTGG 
CAAATTCAAA 
TGTGGGGATG 
GATTTCTTAT 
CGACATGAAC 
GCAGTCCTAG 
ACGCCACCCT 
GGGGTTATCA 
CGGCACCGAT 
CATAGCCTAG 
TCTCTGTATG 
ACAAGCTCGG 
TTGGGCGGGA 
GGTTCCGTGG 
CCCGGGACCC 
GTGGCAGGAT 



GATTTGCGAT 1620 

TCTTGATCAG 1680 

CTTATTATGG 1740 

GAAACTTGGG 1600 

AGTTTGCTTT 1860 

TGTCACAAGA 1920 

TGTTTGACGG 1980 

AT ACACTGTT 2040 

ACGTGACCGG 2100 

TTTGTGGTGG 2160 

CAGTTTCTCT 2220 

AGGGGGAGAA 2280 

CAATTGTCGT 2340 

GTCTGAGGCG 2400 

GCGAAAGGGT 2460 

GTTGCAAACA 2520 

CAGCGATCGG 2580 

ACAACGTTGC 2640 

CTGTCAAAGA 2700 

AAGCACCAGC 2760 

GTTCATCCTT 2820 

CGCGGCGCCA 2880 

GTTTCTCAAG 2940 

CGGCATGGAC 3000 

GGAGAACATC 3060 

CAGGTTCTTG 3120 

3150 



(2) INFORMATION FOR SEQ ID NO; 93: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3579 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 
AAAACCGATA CAAGAAGAAG ACAGTCAACA AGAACGTTAA 
GACGGTTTGG CGGACTTGGA AGAATGTGGC ATTTGCCCAT 
GTTTTTCAAG GAATCGTCAT CCTCCGCCAC CACAAGAACC 
TTCACAACCC ACCGCAAGGT GACAATGCTC AACAACAACA 
AAGAACAGTG GAATAATGCC AGTCAACAAA GAGTGGTGAC 
GCAACAGTGG TTCTGATGCA AGATCAGCTA CACCGCTTCA 
CACCACCATA TGCCCATCAC GAGCAACACC AGCAGGTTAG 
AGTCAATGCA ATGTACCAAT AAGACTATCC CTTCTTACAA 
GTCTGGCAAC AGATGCTGGC CGACACACTT TCAACTGAGT 
ATGCACGACA AGGAAACTCT TACAAAGACA ACACTTGTGC 
CTAAGCCTTA TCAACGTAAT TGAGATCATT GTTTGTCTGA 
AATCTGGTCT AATCTGCACG CCTCATGGGC ATTGTGTGTT 
GCACACATTT TTAGTGCGAA TGTTTGTTTG CTGGTTCCCC 
TGCCCACAGG ATTAGTTTTT TCCTCACTGG AATTCGCTGT 
TGCCCTGCCC TGCCCTGCAC GCCCTGTGTT TTGTGCTGTG 
CCCTGGCGTA CGGCCAAGGT TTTTCCTCAC AGCCAAAAAA 
GGCTGCAAGG TTTTTCACCA CCACCACCAC CACCACCTCA 
TCGACCAGAT CTTCCATTAC TGGTACATTG TCTTGCCATT 
TCGTGGCTCA TGCCAGGACC AATTATTTGA TGAAGAAGTT 
ATGTCCAACT AGACGGGTGG TTTGGCTTCA AATTTGGCCG 



TGTCAACCAG 
GATGTTTATG 
ACCAGTTAAC 
GCAACAACAA 
AGACGAGGGA 
TCAGGAAAAG 
TGTATAGTAG 
CCAAGTTTTC 
TTGGTCTAGA 
TCTGATGCCA 
ATTATACACA 
TTGGGGGGGG 
CTCCCCCCTC 
CCACCTGTCA 
GCACTCCCAC 
AAATTTGGCT 
ACTATTCAAA 
GTTGGTCATT 
GGGCGCTAAG 
TGAATTCCTC 



GCGCCAAGAA 60 

TTCTGGAGAG 120 

GAGATCCATA 180 

CAACCCCCAC 240 

GAAAACGCAA 300 

CAGGAGCTCC 360 

TCTGTAGTTA 420 

TGCCGCGCCT 480 

ATTCTTGCAC 54 0 

CTTGATCTTG 600 

CCAGTGGAAG 660 

GGGGGGGGGT 720 

CCCCCTATCA 780 

ACCCCCTCAC 840 

GCTATAAAAG 900 

GATCCTTTTG 960 

CAAAGGATGC 1020 

ATCAAGCAGA 1080 

CCATTCACAC 1140 

AAAGCTAAAA 1200 
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GTGCTGGGAG GCAGGTTGAT TTAATCATCT CCCGTTTCCA CGATAATGAG GACACTTTCT 1260 

CCAGCTATGC TTTTGGCAAC CATGTGGTGT TCACCAGGGA CCCCGAGAAT ATCAAGGCGC 1320 

TTTTGGCAAC CCAGTTTGGT GATTTTTCAT TGGGAAGCAG GGTCAAATTC TTCAAACCAT 1380 

TGTTGGGGTA CGGTATCTTC ACCTTGGACG GCGAAGGCTG GAAGCACAGC AGAGCCATGT 1440 

TGAGACCACA GTTTGCCAGA GAGCAAGTTG CTCATGTGAC GTCGTTGGAA CCACATTTCC 1500 

AGTTGTTGAA GAAGCATATT CTTAAGCACA AGGGTGAATA CTTTGATATC CAGGAATTGT 1560 

TCTTTAGATT TACCGTTGAT TCAGCGACGG AGTTCTTATT TGGTGAGTCC GTGCACTCCT 1620 

TAAGGGACGA GGAAATTGGC TACGATACGA AGGACATGGC TGAAGAAAGA CGCAAATTTG 1680 

CCGACGCGTT CAACAAGTCG CAAGTCTATT TGTCCACCAG AGTTGCTTTA CAGACATTGT 1740 

ACTGGTTGGT CAACAACAAA GAGTTCAAGG AGTGCAACGA CATTGTCCAC AAGTTCACCA 1800 

ACTACTATGT TCAGAAAGCC TTGGATGCTA CCCCAGAGGA ACTTGAAAAA CAAGGCGGGT I860 

ATGTGTTCTT GTACGAGCTT GCCAAGCAGA CGAAAGACCC CAATGTGTTG CGTGACCAGT 1920 

CTTTGAACAT CTTGTTGGCT GGAAGGGACA CCACTGCTGG GTTGTTGTCC TTTGCTGTGT 1980 

TTGAGTTGGC CAGGAACCCA CACATCTGGG CCAAGTTGAG AGAGGAAATT GAATCACACT 2040 

TTGGGCTGGG TGAGGACTCT CGTGTTGAAG AGATTACCTT TGAGAGCTTG AAGAGATGTG 2100 

AGTACTTGAA AGCCGTGTTG AACGAAACGT TGAGATTACA CCCAAGTGTC CCAAGAAACG 2160 

CAAGATTTGC GATTAAAGAC ACGACTTTAC CAAGAGGCGG TGGCCCCAAC GGCAAGGATC 2220 

CTATCTTGAT CAGAAAGAAT GAGGTGGTGC AATACTCCAT CTCGGCAACT CAGACAAATC 2280 

CTGCTTATTA TGGCGCCGAT GCTGCTGATT TTAGACCGGA AAGATGGTTT GAGCCATCAA 2340 

CTAGAAACTT GGGATGGGCT TACTTGCCAT TCAACGGTGG TCCAAGAATC TGCTTGGGAC 2400 

AACAGTTTGC TTTGACCGAA GCCGGTTACG TTTTGGTTAG ACTTGTTCAG GAATTCCCTA 2460 

GCTTGTCACA GGACCCCGAA ACTGAGTACC CACCACCTAG ATTGGCACAC TTGACGATGT 2520 

GCTTGTTTGA CGGGGCATAC GTCAAGATGC AATAGGTTTT GGTTTGACTT TGTTTCCATA 2580 

TGCAAGTAGT TCAGTAATTA CACACTAATT TGTGGTGGCC GGCGATAAAT TACCGTTTGG 2640 

TTTTGTGTAA AAATTCGGAC ATCTCTGGTG GTTTCCCTTC TCCGCAGCAG CTTTGCCACG 2700 

GGTTTGCTCT GCGGCCAACA AATTCGAAAG GGGGGGGGGG GGGGGAGAAA GTTAACACCC 2760 

CCTGTTCCCA CCGTAGGCTG TAGCTCTTGT GGGGGGATGT AATTGTCGTA CGTTTTCATG 2820 

TTTGGCCCAG ACTTTGATGA TTACGTAGGC TTTCTTATGT CTAAGGCGTG CTTGACACAA 2880 

GTGTCAAAAG GTGACAGGCG ACGTTATTCG ACATGAACGC AAAAGGGTAA TTTGCATCGA 2940 

TACGAGGGGT TGCCTCTGGT CTAAGAAGGA CCCCCCAGGT TGCAAACATG TTGCACTGCA 3000 

TCCCACTCAG AGTTGGTCGA CCACGCCTAC GCTTACCCTC AGCGATCGGC ACTTTCCGTT 3060 

GCTCAATATT TCTCTCCCCC CTGCTTCCCC CCATTGTTCC AGGGATTATC AACAACGTTG 3120 

CCGGTCTCCT CTCCCCCCCC TCCCCCCAGT TATGTACAAG AAAATTAAAT TGTCGCACGG 3180 

CACCGATACG TCAAAGATAC AGAGAAACCT TAATCCCTCC CATAGCCTAG AAGCATCAAA 3240 

AAGATGATTG AGCAACTCCT CCAGTACTGG TACATTGCAC TCCCTGTATG GTTCATTCTC 3300 

CGCTACGTGG CTTCCCACGC ACGAACCATC TACTTGCGCC ACAAGCTCGG CGCGGCGCCG 3360 

TTCACGCACA CCCAGTACGA CGGATGGTAT GGGTTCAAGT TTGGGCGGGA GTTTCTCAAG 3420 

GCGAAGAAGA TTGGAAGGCA GACGGACTTG GTGCATGCGC GGTTCCGTGG AGGGGGCATG 3480 

GATACTTTCT CGAGCTATAC TTTCGGCATC CATATCATTC TTACTCGGGA CCCGGAGAAC 3540 

ATCAAGGCGG TCTTGGCGAC GCAGTTCGAT GACTTTTCG 3579 

(2) INFORMATION FOR SEQ ID NO: 94: 

(i) SEQUENCE CHARACTERISTICS : . 

(A) LENGTH: 3348 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

Ui) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 

GATGTGGTGC TTGATTTCTC GAGACACATC CTTGTGAGGT GCCATGAATC TGTACCTGTC 60 

TGTAAGCACA GGGAACTGCT TCAACACCTT ATTGCATATT CTGTCTATTG CAAGCGTGTG 120 

CTGCAACGAT ATCTGCCAAG GTATATAGCA GAACGTGCTG ATGGTTCCTC CGGTCATATT 180 

CTGTTGGTAG TTCTGCAGGT AAATTTGGAT GTCAGGTAGT GGAGGGAGGT TTGTATCGGT 240 

TGTGTTTTCT TCTTCCTCTC TCTCTGATTC AACCTCCACG TCTCCTTCGG GTTCTGTGTC 300 

TGTGTCTGAG TCGTACTGTT GGATTAAGTC CATCGCATGT GTGAAAAAAA GTAGCGCTTA 360 

TTTAGACAAC CAGTTCGTTG GGCGGGTATC AGAAATAGTC TGTTGTGCAC GACCATGAGT 420 

-36- 



WO 00/20566 



PCT/US99/20797 



ATGCAACTTG ACGAGACGTC GTTAGGAATC CACAGAATGA TAGCAGGAAG CTTACTACGT 480 

GAGAGATTCT GCTTAGAGGA TGTTCTCTTC TTGTTGATTC CATTAGGTGG GTATCATCTC 540 

CGGTGGTGAC AACTTGACAC AAGCAGTTCC GAGAACCACC CACAACAATC ACCATTCCAG 600 

CTATCACTTC TACATGTCAA CCTACGATGT ATCTCATCAC CATCTAGTTT CTTGGCAATC 660 

GTTTATTTGT TATGGGTCAA CATCCAATAC AACTCCACCA ATGAAGAAGA AAAACGGAAA 720 

GCAGAATACC AGAATGACAG TGTGAGTTCC TGACCATTGC TAATCTATGG CTATATCTAG 780 

TTTGCTATCG TGGGATGTGA TCTGTGTCGT CTTCATTTGC GTTTGTGTTT ATTTCGGGTA 840 

TGAATATTGT TATACTAAAT ACTTGATGCA CAAACATGGC GCTCGAGAAA TCGAGAATGT 900 

GATCAACGAT GGGTTCTTTG GGTTCCGCTT ACCTTTGCTA CTCATGCGAG CCAGCAATGA 960 

GGGCCGACTT ATCGAGTTCA GTGTCAAGAG ATTCGAGTCG GCGCCACATC CACAGAACAA 1020 

GACATTGGTC AACCGGGCAT TGAGCGTTCC TGTGATACTC ACCAAGGACC CAGTGAATAT 1080 

CAAAGCGATG CTATCGACCC AGTTTGATGA CTTTTCCCTT GGGTTGAGAC TACACCAGTT 1140 

TGCGCCGTTG TTGGGGAAAG GCATCTTTAC TTTGGACGGC CCAGAGTGGA AGCAGAGCCG 1200 

ATCTATGTTG CGTCCGCAAT TTGCCAAAGA TCGGGTTTCT CATATCCTGG ATCTAGAACC 1260 

GCATTTTGTG TTGCTTCGGA AGCACATTGA TGGCCACAAT GGAGACTACT TCGACATCCA 1320 

GGAGCTCTAC TTCCGGTTCT CGATGGATGT GGCGACGGGG TTTTTGTTTG GCGAGTCTGT 1380 

GGGGTCGTTG AAAGACGAAG ATGCGAGGTT CCTGGAAGCA TTCAATGAGT CGCAGAAGTA 1440 

TTTGGCAACT AGGGCAACGT TGCACGAGTT GTACTTTCTT TGTGACGGGT TTAGGTTTCG 1500 

CCAGTACAAC AAGGTTGTGC GAAAGTTCTG CAGCCAGTGT GTCCACAAGG CGTTAGATGT 1560 

TGCACCGGAA GACACCAGCG AGTACGTGTT TCTCCGCGAG TTGGTCAAAC ACACTCGAGA 1620 

TCCCGTTGTT TTACAAGACC AAGCGTTGAA CGTCTTGCTT GCTGGACGCG ACACCACCGC 1680 

GTCGTTATTA TCGTTTGCAA CATTTGAGCT AGCCCGGAAT GACCACATGT GGAGGAAGCT 1740 

ACGAGAGGAG GTTATCCTGA CGATGGGACC GTCCAGTGAT GAAATAACCG TGGCCGGGTT 1800 

GAAGAGTTGC CGTTACCTCA AAGCAATCCT AAACGAAACT CTTCGACTAT ACCCAAGTGT 1860 

GCCTAGGAAC GCGAGATTTG CTACGAGGAA TACGACGCTT CCTCGTGGCG GAGGTCCAGA 1920 

TGGATCGTTT CCGATTTTGA TAAGAAAGGG CCAGCCAGTG GGGTATTTCA TTTGTGCTAC 1980 

ACACTTGAAT GAGAAGGTAT ATGGGAATGA TAGCCATGTG TTTCGACCGG AGAGATGGGC 2040 

TGCGTTAGAG GGCAAGAGTT TGGGCTGGTC GTATCTTCCA TTCAACGGCG GCCCGAGAAG 2100 

CTGCCTTGGT CAGCAGTTTG CAATCCTTGA AGCTTCGTAT GTTTTGGCTC GATTGACACA 2160 

GTGCTACACG ACGATACAGC TTAGAACTAC CGAGTACCCA CCAAAGAAAC TCGTTCATCT 2220 

CACGATGAGT CTTCTCAACG GGGTGTACAT CCGAACTAGA ACTTGATTAT GTGTTTATGG 2280 

TTAATCGGGG CAAAGCACTG CAAGTCAT7G ATGTTTGTGG AAGCCCAGCA TTGGTGTTCC 2340 

GGAGCATCAA TAACCAATGT CTTGAAGGGT TTGATTTTCT TGACCTTCTT CTTCCTGAGC 2400 

TTCTTTCCGT CAAACTTGTA CAGAATGGCC ATCATTTCAG GAACAACCAC GTACGACGGC 2460 

CGGTACCGCA TCTGGAGTAT CTCGCCGTCG TTCAAGTAGC ACGAAAACAG CAACGACGTC 2520 

ACCATCTGCT TCCCAATCTT GACACCCACA GATACCCCTG CGGCTTCATG GATCAAAAAC 2580 

GTCGGCAACC CCGCGTATAT GTCCATGTAA TTCTCCATGG CCACCTCCAT CAACACACTG 2640 

ATGGAGCGAC TGACGGTGCC ACCACTGCCC TCGGTTGAGT CAAGGCAGTA TGATGCCGGG 2700 

ATCCAGTACT CCAATGGGAA CCTCTGCACG GTGTCGCTGC AGTTTTTGAG GCGTATTTCG 2760 

ATCCATGATC GTTCTTTGGT GCTGTAGTAT AACGAGCTCT TGGTGTCCTT GAAATGGAAC 2820 

AGGTTGGATG TGTTGTTGAG TTTGTCTGCG TGCTTGGTTT GCAAGTCTTC GATCGAGCGT 2880 

AGTGAGTAGA CAGTTGGCGG GGGTGGTGGC TCGGGCTTTA TTCTGTGTTT GTGTTTCCTT 2940 

CTTAGTCTTG GAATGACGCT GTTATCGACG GTTCGTAGTA TAAGTAGCGC CAATATGAGA 3000 

ATGTATATCC GCATCACCCA AGACTCTTCA GCCTGTTACA ACGACTGAGG CTGTTGGCCG 3060 

TGTGACCAAT TGGTTTCTTT GGTGACCTAG ATTGGTCCCG CAGGGAAAGC AAGGGCTGCT 3120 

AGGGGGGCAT ACCAAACAAG GTCGTGTAAT CAGTATCTAT GGTGCTACCA TGTGTGTGGT 3180 

TGGGGGGAAA TTCCCGCATT TTTGTGTAAC GAAAGTTCTA GAAAGTTCTC GTGGGTTCTG 3240 

AGAATCTGCT GGAACCATCC ACCCGCATTT CCGTTGCCAA AGTGGGAAGA GCAATCAACC 3300 

CACCCTGCTT TGCCCAATCA GCCATTCCCC TGGGAATATA AATTCAAC 3348 

(2) INFORMATION FOR SEQ ID NO: 95: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 523 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unJcnovn 
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(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 



Met 


Ala 


Thr 


Gin 


GlU 


He 


He 


Asp 


Ser 


Val Leu Pro Tyr 


Leu 


Thr 


Lys 


1 








5 










10 




15 




Trp 


Tyr 


Thr 


Val 


lie 


Thr 


Ala 


Ala 


Val 


Leu Val Phe Leu 


He 


Ser 


Thr 








20 










25 




30 






Asn 


He 


Lys 


Asn 


Tyr 


Val 


Lys 


Ala 


Lys 


Lys Leu Lys Cys 


Val 


Asp 


Pro 






35 










40 




45 








Pro 


Tyr 


Leu 


Lys 


Asp 


Ala 


Gly 


Leu 


Thr 


Gly He Leu Ser 


Leu 


He 


Ala 




50 










55 






60 








Ala 


He 


Lys 


Ala 


Lys 


Asn 


Asp 


Gly 


Arg 


Leu Ala Asn Phe 


Ala Asp 


Glu 


65 










70 








75 






80 


Val 


Phe 


Asp 


Glu 


Tyr 


Pro 


Asn 


His 


Thr 


Phe Tyr Leu Ser 


Val 


Ala 


Gly 










85 










90 




95 




Ala 


Leu 


Lys 


He 


Val 


Met 


Thr 


Val 


Asp 


Pro Glu Asn He 


Lys 


Ala 


Val 








100 










105 




110 






Leu 


Ala 


Thr 


Gin 


Phe 


Thr 


Asp 


Phe 


Ser 


Leu Gly Thr Arg 


His 


Ala 


His 






115 










120 




125 








Phe 


Ala 


Pro 


Leu 


Leu 


Gly 


Asp 


Gly 


lie 


Phe Thr Leu Asp 


Gly Glu 


Gly 




130 










135 






140 








Trp 


Lys 


His 


Ser 


Arg 


Ala 


Met 


Leu 


Arg 


Pro Gin Phe Ala 


Arg 


Asp 


Gin 


145 










150 








155 






160 


lie 


Gly 


His 


Val 


Lys 


Ala 


Leu 


Glu 


Pro 


His He Gin He 


Met 


Ala 


Lys 










165 










170 




175 




Gin 


He 


Lys 


Leu 


Asn 


Gin 


Gly 


Lys 


Thr 


Phe Asp He Gin 


Glu 


Leu 


Phe 








180 










185 




190 






Phe 


Arg 


Phe 


Thr 


Val 


Asp 


Thr 


Ala 


Thr 


Glu Phe Leu Phe 


Gly Glu 


Ser 






195 










200 




205 








Val 


His 


Ser 


Leu 


Tyr 


Asp 


Glu 


Lys 


Leu 


Gly He Pro Thr 


Pro 


Asn 


Glu 




210 










215 






220 








He 


Pro 


Gly 


Arg 


Glu 


Asn 


Phe 


Ala 


Ala 


Ala Phe Asn Val 


Ser 


Gin 


His 


225 










230 








235 






240 


Tyr 


Leu 


Ala 


Thr 


Arg 


Ser 


Tyr 


Ser 


Gin 


Thr Phe Tyr Phe 


Leu 


Thr 


Asn 










245 










250 




255 




Pro 


Lys 


Glu 


Phe 


Arg 


Asp 


Cys 


Asn 


Ala 


Lys Val His His 


Leu 


Ala 


Lys 








260 










265 




270 






Tyr 


Phe 


Val 


Asn 


Lys 


Ala 


Leu 


Asn 


Phe 


Thr Pro Glu Glu 


Leu 


Glu 


Glu 






275 










280 




285 








Lys 


Ser 


Lys 


Ser 


Gly 


Tyr 


Val 


Phe 


Leu 


Tyr Glu Leu Val 


Lys 


Gin 


Thr 




290 










295 






300 








Arg 


Asp 


Pro 


Lys 


Val 


Leu 


Gin 


Asp 


Gin 


Leu Leu Asn He 


Met 


Val 


Ala 


305 










310 








315 






320 


Gly 


Arg 


Asp 


Thr 


Thr 


Ala 


Gly 


Leu 


Leu 


Ser Phe Ala Leu 


Phe 


Glu 


Leu 










325 










330 




335 




Ala 


Arg 


His 


Pro 


Glu 


Met 


Trp 


Ser 


Lys 


Leu Arg Glu Glu 


He 


Glu 


Val 








340 










345 




350 






Asn 


Phe 


Gly 


Val 


Gly 


Glu 


Asp 


Ser 


Arg 


Val Glu Glu He 


Thr 


Phe 


Glu 






355 










360 




365 








Ala 


Leu 


Lys 


Arg 


Cys 


Glu 


Tyr 


Leu 


Lys 


Ala He Leu Asn 


Glu 


Thr 


Leu 




370 










375 






380 








Arg 


Met 


Tyr 


Pro 


Ser 


Val 


Pro 


Val 


Asn 


Phe Arg Thr Ala 


Thr 


Arg 


Asp 


385 










390 








395 






400 


Thr 


Thr 


Leu 


Pro 


Arg 


Gly 


Gly 


Gly 


Ala 


Asn Gly Thr Asp 


Pro 


He 


Tyr 










405 










410 




415 




He 


Pro 


Lys 


Gly 


Ser 


Thr 


Val 


Ala 


Tyr 


Val Val Tyr Lys 


Thr 


His 


Arg 








420 










425 




430 
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Leu 


Glu 


Glu 


Tyr 


Tyr 


Gly Lys 


Asp 






435 








AAA 

440 


Trp 


Phe 


Glu 


Pro 


Ser 


Thr Lys 


Lys 


450 








455 




Asn 


Gly 


Gly 


Pro 


Arg 


Val Cys 


Leu 












470 




Ala 


Ser 


Tyr 


Val 


He 


Thr Arg 


Leu 










485 






Ser 


Asp 


Pro 


Gly 


Leu 


Glu Tyr 


Pro 








500 








Met 


Ser 


His 


Asn 


Asp 


Gly Val 


Phe 






515 








520 



445 

Leu Gly Trp Ala Tyr Val Pro Phe 

460 

Gly Gin Gin Phe Ala Leu Thr Glu 
475 480 

Ala Gin Met Phe Glu Thr Val Ser 

490 495 
Pro Pro Lys Cys He His Leu Thr 
505 510 



(2) INFORMATION FOR SEQ ID NO: 96: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 522 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96 : 

Met Thr Val His Asp He He Ala Thr Tyr Phe Thr Lys Trp Tyr Val 

1 5 10 15 

He Val Pro Leu Ala Leu He Ala Tyr Arg Val Leu Asp Tyr Phe Tyr 

20 25 30 

Gly Arg Tyr Leu Met Tyr Lys Leu Gly Ala Lys Pro Phe Phe Gin Lys 

35 40 45 

Gin Thr Asp Gly Cys Phe Gly Phe Lys Ala Pro Leu Glu Leu Leu Lys 

50 55 SO 

Lys Lys Ser Asp Gly Thr Leu He Asp Phe Thr Leu Gin Arg He His 
65 70 75 80 

Asp Leu Asp Arg Pro Asp He Pro Thr Phe Thr Phe Pro Val Phe Ser 

85 90 95 

He Asn Leu Val Asn Thr Leu Glu Pro Glu Asn He Lys Ala He Leu 

100 105 HO 

Ala Thr Gin Phe Asn Asp Phe Ser Leu Gly Thr Arg His Ser His Phe 

US 120 125 

Ala Pro Leu Leu Gly Asp Gly He Phe Thr Leu Asp Gly Ala Gly Trp 

130 135 140 

Lys His Ser Arg Ser Met Leu Arg Pro Gin Phe Ala Arg Glu Gin He 
145 150 155 160 

Ser His Val Lys Leu Leu Glu Pro His Val Gin Val Phe Phe Lys His 

165 170 175 

Val Arg Lys Ala Gin Gly Lys Thr Phe Asp He Gin Glu Leu Phe Phe 

180 185 190 

Arg Leu Thr Val Asp Ser Ala Thr Glu Phe Leu Phe Gly Glu Ser Val 

195 200 205 

Glu Ser Leu Arg Asp Glu Ser He Gly Met Ser He Asn Ala Leu Asp 

210 215 220 

Phe Asp Gly Lys Ala Gly Phe Ala Asp Ala Phe Asn Tyr Ser Gin Asn 
225 230 235 240 

Tyr Leu Ala Ser Arg Ala Val Met Gin Gin Leu Tyr Trp Val Leu Asn 

245 250 255 

Gly Lys Lys Phe Lys Glu Cys Asn Ala Lys Val His Lys Phe Ala Asp 

260 265 270 
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Tyr Tyr Val Asn Lys Ala Leu Asp Leu Thr Pro Glu Gin Leu Glu Lys 

275 280 285 

Gin Asp Gly Tyr Val Phe Leu Tyr Glu Leu Val Lys Gin Thr Arg Asp 

290 295 300 

Lys Gin Val Leu Arg Asp Gin Leu Leu Asn lie Met Val Ala Gly Arg 
305 310 315 320 

Asp Thr Thr Ala Gly Leu Leu Ser Phe Val Phe Phe Glu Leu Ala Arg 

325 330 335 

Asn Pro Glu Val Thr Asn Lys Leu Arg Glu Glu lie Glu Asp Lys Phe 

340 345 350 

Gly Leu Gly Glu Asn Ala Ser Val Glu Asp lie Ser Phe Glu Ser Leu 

355 360 365 

Lys Ser Cya Glu Tyr Leu Lys Ala Val Leu Asn Glu Thr Leu Arg Leu 

370 375 380 

Tyr Pro Ser Val Pro Gin Asn Phe Arg Val Ala Thr Lys Asn Thr Thr 
385 390 395 400 

Leu Pro Arg Gly Gly Gly Lys Asp Gly Leu Ser Pro Val Leu Val Arg 

405 4X0 415 

Lys Gly Gin Thr Val He Tyr Gly Val Tyr Ala Ala His Arg Asn Pro 

420 425 430 

Ala Val Tyr Gly Lys Asp Ala Leu Glu Phe Arg Pro Glu Arg Trp Phe 

435 440 445 

Glu Pro Glu Thr Lys Lys Leu Gly Trp Ala Phe Leu Pro Phe Asn Gly 

450 455 460 

Gly Pro Arg lie Cys Leu Gly Gin Gin Phe Ala Leu Thr Glu Ala Ser 
465 470 475 480 

Tyr Val Thr Val Arg Leu Leu Gin Glu Phe Ala His Leu Ser Met Asp 

485 490 495 

Pro Asp Thr Glu Tyr Pro Pro Lys Lys Met Ser His Leu Thr Met Ser 

500 505 510 

Leu Phe Asp Gly Ala Asn He Glu Met Tyr 
515 520 

(2) INFORMATION FOR SEQ ID NO: 97: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 522 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 97: 

Met Thr Ala Gin Asp He He Ala Thr Tyr He Thr Lys Trp Tyr Val 

1 5 10 15 

He Val Pro Leu Ala Leu He Ala Tyr Arg Val Leu Asp Tyr Phe Tyr 

20 25 30 

Gly Arg Tyr Leu Met Tyr Lys Leu Gly Ala Lys Pro Phe Phe Gin Lys 

35 40 45 

Gin Thr Asp Gly Tyr Phe Gly Phe Lys Ala Pro Leu Glu Leu Leu Lys 

50 55 60 

Lys Lys Ser Asp Gly Thr Leu He Asp Phe Thr Leu Glu Arg He Gin 
65 70 75 80 

Ala Leu Asn Arg Pro Asp He Pro Thr Phe Thr Phe Pro He Phe Ser 

85 90 95 

He Asn Leu He Ser Thr Leu Glu Pro Glu Asn He Lys Ala He Leu 

100 105 110 
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Ala 


Thr 


Gin Phe 


Asn Asp Phe Ser 


Leu Gly Thr Arg 


His Ser His 


Phe 






115 


120 




125 




Ala 


Pro 


L u Leu 


Gly Asp Gly He 


Phe Thr Leu Asp 


Gly Ala Gly 


Trp 




130 




135 


140 






Lys 


His 


Ser Arg 


Ser Met Leu Arg 


Pro Gin Phe Ala 


Arg Glu Gin 


He 


145 






150 


155 




160 


Ser 


Hia 


Val Lys 


Leu Leu Glu Pro 


His Met Gin Val 


Phe Phe Lys 


His 








165 


170 


175 




Val 


Arg 


Lys Ala 


Gin Gly Lys Thr 


Phe Asp He Gin 


Glu Leu Phe 


Phe 






180 




185 


190 




Arg 


Leu 


Thr Val 


Asp Ser Ala Thr 


Glu Phe Leu Phe 


Gly Glu Ser 


Val 






195 


200 




*\ t\ ^ 
205 




Glu 


Ser 


Leu Arg 


Asp Glu Ser lie 


Gly Met Ser He 


Asn Ala Leu 


Asp 




210 




215 


t n 

220 






Phe 


Asp 


Gly Lys 


Ala Gly Phe Ala 


Asp Ala Pne Asn 


Tyr ser Gin 


Asn 


225 






230 


^ e 

235 






Tyr 


Leu 


Ala Ser 


Arg Ala Val Met 


Gin Gin Leu Tyr 


Trp Val Leu 


Asn 








245 


250 


255 




Gly 


Lys 


Lys Phe 


Lys Glu Cys Asn 


Ala Lys Val His 


Lys Phe Ala 


Asp 






260 




265 


270 




Tyr 


Tyr 


Val Ser 


Lys Ala Leu Asp 


Leu Thr Pro Glu 


Gin Leu Glu 


Lys 






275 


280 




285 




Gin 


Asp 


Gly Tyr 


Val Phe Leu Tyr 


Glu Leu Val Lys 


Gin Thr Arg 


Asp 




290 




295 


300 






Arg 


Gin 


Val Leu 


Arg Asp Gin Leu 


Leu Asn He Met 


Val Ala Gly 


Arg 


305 






310 


315 




320 


Asp 


Thr 


Thr Ala 


Gly Leu Leu Ser 


Phe Val Phe Phe 


Glu Leu Ala 


Arg 








325 


330 


335 




Asn 


Pro 


Glu Val 


Thr Asn Lys Leu 


Arg Glu Glu He 


Glu Asp Lys 


Phe 






340 




345 


350 




Gly 


Leu 


Gly Glu 


Asn Ala Arg Val 


Glu Asp He Ser 


Phe Glu Ser 


Leu 






355 


360 




365 




Lys 


Ser 


Cys Glu 


Tyr Leu Lys Ala 


Val Leu Asn Glu 


Thr Leu Arg 


Leu 




370 




375 


380 






Tyr 


Pro 


Ser Val 


Pro Gin Asn Phe 


Arg Val Ala Thr 


Lys Asn Thr 


Thr 


385 






390 


395 




400 


Leu 


Pro 


Arg Gly 


Gly Gly Lys Asp 


Gly Leu Ser Pro 


Val Leu Val 


n - — 

Arg 








405 


410 


415 




Lys 


Gly 


Gin Thr 


Val Met Tyr Gly 


Val Tyr Ala Ala 


His Arg Asn 


Pro 






420 




425 


430 




Ala 


Val 


Tyr Gly 


Lys Asp Ala Leu 


Glu Phe Arg Pro 


Glu Arg Trp 


pne 






435 


440 




445 




Glu 


Pro 


Glu Thr 


Lys Lys Leu Gly 


Trp Ala Phe Leu 


Pro Phe Asn 


Gly 




450 




455 


460 






Gly 


Pro 


Arg lie 


Cys Leu Gly Gin 


Gin Phe Ala Leu 


Thr Glu Ala 


Ser 


465 






470 


475 




480 


Tyr 


Val 


Thr Val 


Arg Leu Leu Gin 


Glu Phe Gly His 


Leu Ser Met 


Asp 








485 


490 


495 




Pro 


Asn 


Thr Glu 


Tyr Pro Pro Arg 


Lys Met Ser His 


Leu Thr Met 


Ser 






500 




505 


510 




Leu 


Phe 


Asp Gly 


Ala Asn He Glu 


Met Tyr 










515 


520 









) INFORMATION FOR SBQ ID NO: 98: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 540 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



Met 


Ser 


Ser 


Ser 


Pro Ser 
5 


Phe 


Ala 


Gin 


1 

Pro 


Tyr 


He 


Glu 


Tyr Phe 


Leu 


Asp 


Asn 








20 








25 


He 


Pro 


Leu 


Val 


Leu Leu 


Ser 


Leu 


Asn 






35 








40 




Arg 


Tyr 


Leu 


Glu 


Arg Arg 


Phe 


His 


Ala 




50 








55 






Arg 


Asp 


Pro 


Thr 


Phe Gly 


He 


Ala 


Thr 


65 








70 








Lys 


Ser 


Lys 


Gly 


Thr Val 


Met 


Lys 


Phe 










85 








Lys 


Tyr 


He 


Val 


Arg Asp 


Pro 


Lys 


Tyr 








100 








105 


Val 


Gly 


Leu 


Pro 


Leu He 


Glu 


Thr 


Met 






115 








120 




Val 


Leu 


Ala 


Thr 


Gin Phe 


Asn 


Asp 


Phe 




130 








135 






Phe 


Leu 


Tyr 


Ser 


Leu Leu 


Gly 


Asp 


Gly 


145 








150 








Gly 


Tip 


Lys 


His 


Ser Arg 


Thr 


Met 


Leu 










165 








Gin 


Val 


Ser 


His 


Val Lys 


Leu 


Leu 


Glu 








180 








185 


Lys 


His 


Val 


Arg 


Lys His 


Arg 


Gly 


Gin 






195 








200 




Phe 


Phe 


Arg 


Leu 


Thr Val 


Asp 


Ser 


Ala 




210 








215 






Ser 


Ala 


Glu 


Ser 


Leu Arg 


Asp 


Glu 


Ser 


225 








230 








Lys 


Asp 


Phe 


Asp 


Gly Arg 


Arg 


Asp 


Phe 










245 








Gin 


Thr 


Tyr 


Gin 


Ala Tyr 


Arg 


Phe 


Leu 








260 








265 


Leu 


Asn 


Gly 


Ser 


Glu Phe 


Arg 


Lys 


Ser 






275 








2B0 




Ala 


Asp 


His 


Tyr 


Val Gin 


Lys 


Ala 


Leu 




290 








295 






Gin 


Lys 


Gin 


Asp 


Gly Tyr 


Val 


Phe 


Leu 


305 








310 








Arg 


Asp 


Pro 


Lys 


Val Leu 


Arg 


Asp 


Gin 










325 








Gly 


Arg 


Asp 


Thr 


Thr Ala 


Gly 


Leu 


Leu 








340 








345 


Ser 


Arg 


Asn 


Pro 


Glu Val 


Phe 


Ala 


Lys 






355 








360 




Arg 


Phe 


Gly 


Leu 


Gly Glu 


Glu 


Ala 


Arg 




370 








375 






Ser 


Leu 


LyB 


Ser 


Cys Glu 


Tyr 


Leu 


Lys 


385 








390 
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98: 



Glu 


Val 


Leu Ala Thr Thr 


Ser 


10 




15 




Tyr Thr 


Arg Trp Tyr Tyr 


Phe 






30 




Phe 


lie 


Ser Leu Leu His 


Thr 






45 




Lys 


Pro 


Leu Gly Asn Phe 


Val 






60 




Pro 


Leu 


Leu Leu lie Tyr 


Leu 




75 




80 


Ala Tip 


Gly Leu Trp Asn 


Asn 


90 




95 




Lys 


Thr 


Thr Gly Leu Arg 


He 






110 




Asp 


Pro 


Glu Asn He Lys 


Ala 






125 




Ser 


Leu 


Gly Thr Arg His 


Asp 






140 




lie 


Phe 


Thr Leu Asp Gly 


Ala 




155 




160 


Arg 


Pro 


Gin Phe Ala Arg 


Glu 


170 




175 




Pro 


His 


Val Gin Val Phe 


Phe 






190 




Thr 


Phe 


Asp He Gin Glu 


Leu 






205 




Thr 


Glu 


Phe Leu Phe Gly 


Glu 






220 




He Gly 


Leu Thr Pro Thr 


Thr 




235 




240 


Ala Asp 


Ala Phe Asn Tyr 


Ser 


250 




255 




Leu 


Gin 


Gin Met Tyr Trp 


He 






270 




He 


Ala 


Val Val His Lys 


Phe 






285 




Glu 


Leu 


Thr Asp Asp Asp 


Leu 






300 




Tyr 


Glu 


Leu Ala Lys Gin 


Thr 




315 




320 


Leu 


Leu 


Asn He Leu Val 


Ala 


330 




335 




Ser 


Phe 


Val Phe Tyr Glu 


Leu 






350 




Leu Arg 


Glu Glu Val Glu 


Asn 






365 




Val 


Glu 


Glu He Ser Phe 


Glu 






380 




Ala 


Val 


He Asn Glu Thr 


Leu 




395 




400 
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Arcr 


Leu 


Tyr 


Pro 


Ser 


Val 


Pro His Asn Phe 


Arg Val Ala Thr Arg Asn 










405 




410 


415 


Thr 


Thr 


Leu 


Pro 


Arg 


Gly 


Gly Gly Glu Asp 


Gly Tyr Ser Pr lie Val 








420 






425 


430 


Val 


Lys 


Lys 


Gly 


Gin 


Val 


Val Met Tyr Thr 


Val He Ala Thr His Arg 






435 








440 


445 


Asp 


Pro 


Ser 


lie 


Tyr 


Gly 


Ala Asp Ala Asp 


Val Phe Arg Pro Glu Arg 




450 










455 


460 


Trp 


Phe 


Glu 


Pro 


Glu 


Thr 


Arg Lys Leu Gly 


Trp Ala Tyr Val Pro Phe 


465 










470 




475 480 


Asn 


Gly 


Gly 


Pro 


Arg 


lie 


Cys Leu Gly Gin 


Gin Phe Ala Leu Thr Glu 










485 




490 


495 


Ala 


Ser 


Tyr 


Val 


Thr 


Val 


Arg Leu Leu Gin 


Glu Phe Ala His Leu Ser 








500 






505 


510 


Met 


Asp 


Pro 


Asp 


Thr 


Glu 


Tyr Pro Pro Lys 


Leu Gin Asn Thr Leu Thr 






515 








520 


525 


Leu 


Ser 


Leu 


Phe 


Asp 


Gly 


Ala Asp Val Arg 


Met Tyr 




530 










535 


540 



INFORMATION FOR SEQ ID NO: 99: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 540 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 



Met 


Ser 


Ser 


Ser 


Pro 


Ser 


Phe 


Ala 


Gin Glu 


Val Leu Ala Thr Thr Ser 


1 








5 








10 


15 


Pro 


Tyr 


lie 


Glu 


Tyr 


Phe 


Leu 


Asp 


Asn Tyr 


Thr Arg Trp Tyr Tyr Phe 








20 










25 


30 


He 


Pro 


Leu 


Val 


Leu 


Leu 


Ser 


Leu 


Asn Phe 


He Ser Leu Leu His Thr 






35 










40 




45 


Lys 


Tyr 


Leu 


Glu 


Arg 


Arg 


Phe 


His 


Ala Lys 


Pro Leu Gly Asn Val Val 




50 










55 






60 


Leu 


Asp 


Pro 


Thr 


Phe 


Gly 


He 


Ala 


Thr Pro 


Leu He Leu lie Tyr Leu 


65 










70 








75 80 


Lys 


Ser 


Lys 


Gly 


Thr 


Val 


Met 


Lys 


Phe Ala 


Trp Ser Phe Trp Asn Asn 










85 








90 


95 


Lys 


Tyr 


He 


Val 


Lys 


Asp 


Pro 


Lys 


Tyr Lys 


Thr Thr Gly Leu Arg He 








100 










105 


110 


Val 


Gly 


Leu 


Pro 


Leu 


He 


Glu 


Thr 


He Asp 


Pro Glu Asn He Lys Ala 






115 










120 




125 


Val 


Leu 


Ala 


Thr 


Gin 


Phe 


Asn 


Asp 


Phe Ser 


Leu Gly Thr Arg His Asp 




130 










135 






140 


Phe 


Leu 


Tyr 


Ser 


Leu 


Leu 


Gly 


Asp 


Gly He 


Phe Thr Leu Asp Gly Ala 


145 










150 








155 160 


Gly 


Trp 


Lys 


His 


Ser 


Arg 


Thr 


Met 


Leu Arg 


Pro Gin Phe Ala Arg Glu 










165 








170 


175 


Gin 


Val 


Ser 


His 


Val 


Lys 


Leu 


Leu 


Glu Pro 


His Val Gin Val Phe Phe 








180 










185 


190 


Lys 


His 


Val 


Arg 


Lys 


His 


Arg 


Gly 


Gin Thr 


Phe Asp He Gin Glu Leu 






195 










200 




205 


Phe 


Phe 


Arg 


Leu 


Thr 


Val 


Asp 


Ser 


Ala Thr 


Glu Phe Leu Phe Gly Glu 




210 










215 






220 
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Ser Ala Glu Ser Leu Arg Asp Asp Ser Val Gly Leu Thr Pro Thr Thr 
225 230 235 240 

Lys Asp Phe Glu Gly Arg Gly Asp Phe Ala Asp Ala Phe Asn Tyr Ser 

245 250 255 

Gin Thr Tyr Gin Ala Tyr Arg Phe Leu Leu Gin Gin Met Tyr Trp lie 

260 265 270 

Leu Asn Gly Ala Glu Phe Arg Lys Ser He Ala He Val His Lys Phe 

275 280 285 

Ala Asp His Tyr Val Gin Lys Ala Leu Glu Leu Thr Asp Asp Asp Leu 

290 295 300 

Gin Lys Gin Asp Gly Tyr Val Phe Leu Tyr Glu Leu Ala Lys Gin Thr 
305 310 315 320 

Arg Asp Pro Lys Val Leu Arg Asp Gin Leu Leu Asn He Leu Val Ala 

325 330 335 

Gly Arg Asp Thr Thr Ala Gly Leu Leu Ser Phe Val Phe Tyr Glu Leu 

340 345 350 

Ser Arg Asn Pro Glu Val Phe Ala Lys Leu Arg Glu Glu Val Glu Asn 

355 360 365 

Arg Phe Gly Leu Gly Glu Glu Ala Arg Val Glu Glu He Ser Phe Glu 

370 ,375 380 

Ser Leu Lys Ser Cys Glu Tyr Leu Lys Ala Val He Asn Glu Ala Leu 
385 390 395 400 

Arg Leu Tyr Pro Ser Val Pro His Asn Phe Arg Val Ala Thr Arg Asn 

405 410 415 

Thr Thr Leu Pro Arg Gly Gly Gly Lys Asp Gly Cys Ser Pro He Val 

420 425 430 

Val Lys Lys Gly Gin Val Val Met Tyr Thr Val He Gly Thr His Arg 

435 440 445 

Asp Pro Ser He Tyr Gly Ala Asp Ala Asp Val Phe Arg Pro Glu Arg 

450 455 460 

Trp Phe Glu Pro Glu Thr Arg Lys Leu Gly Trp Ala Tyr Val Pro Phe 
465 470 475 480 

Asn Gly Gly Pro Arg He Cys Leu Gly Gin Gin Phe Ala Leu Thr Glu 

485 490 495 

Ala Ser Tyr Val Thr Val Arg Leu Leu Gin Glu Phe Gly Asn Leu Ser 

500 505 510 

Leu Asp Pro Asn Ala Glu Tyr Pro Pro Lys Leu Gin Asn Thr Leu Thr 

515 520 525 

Leu Ser Leu Phe Asp Gly Ala Asp Val Arg Met Phe 
530 535 540 

) INFORMATION FOR SEQ ID NO: 100: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 517 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:100: 

Met He Glu Gin Leu Leu Glu Tyr Trp Tyr Val Val Val Pro Val Leu 

1 5 10 15 

Tyr He He Lys Gin Leu Leu Ala Tyr Thr Lys Thr Arg Val Leu Met 

20 25 30 

Lys Lys Leu Gly Ala Ala Pro Val Thr Asn Lys Leu Tyr Asp Asn Ala 
35 40 45 
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Phe 


Gly 


lie 


Val 


Asn 


Gly 


Trp 


Lys 


Ala Leu Gin Phe Lys Lys Glu 


Gly 




50 










55 




60 




Arg 


Ala 


Gin 


Glu 


Tyr 


Asn 


Asp 


Tyr 


Lys Pne Asp His ser L>ys Asn 


Fro 


65 










70 






75 


80 


Ser 


Val 


Gly 


Thr 


Tyr 


Val 


Ser 


He 


Leu Phe Gly Thr Arg lie Val 


val 








85 








90 95 




Thr 


Lys 


Asp 


Pro 


Glu 


Asn 


lie 


Lys 


Ala He Leu Ala Tnr Gin Pne 


n las 

Gly 






100 










105 110 




Asp 


Phe 


Ser 


Leu 


Gly 


Lys 


Arg 


HIS 


Thr Leu Pne Lys pro Lieu Lieu 


Giy 




115 










120 


125 




Asp 


Gly 


He 


Phe 


Thr 


Leu 


Asp 


Gly 


Glu Gly Trp Lys His Ser Arg 


Ala 


130 










135 




140 




Met 


Leu 


Arg 


Pro 


Gin 


Phe 


Ala 


Arg 


Glu Gin vai Aia his vai lnr 


C A V* 

ser 


145 








150 






155 


loO 


Leu 


Glu 


Pro 


His 


Phe 


Gin 


Leu 


Leu 


Lys Lys His He Leu Lys His 


Lys 










165 








170 175 




Gly 


Glu 


Tyr 


Phe 


Asp 


He 


Gin 


Glu 


Leu Phe Phe Arg Phe Thr Val 


Asp 






180 










185 190 




Ser 


Ala 


Thr 


Glu 


Phe 


Leu 


Phe 


Gly 


Glu Ser Val His Ser Leu Lys 


Asp 






195 










200 


205 




Glu 


Ser 


He 


Gly 


lie 


Asn 


Gin 


Asp 


Asp He Asp Phe Ala Gly Arg 


Lys 




210 










215 




220 




Asp 


Phe 


Ala 


Glu 


Ser 


Phe 


Asn 


Lys 


Ala Gin Glu Tyr Leu Ala He 


Arg 


225 










230 






235 


240 


Thr 


Leu 


Val 


Gin 


Thr 


Phe 


Tyr 


Trp 


Leu Val Asn Asn Lys Glu Phe 


Arg 










245 








250 255 




Asp 


Cys 


Thr 


Lys 


Leu 


Val 


His 


Lys 


Phe Thr Asn Tyr Tyr Val Gin 


Lya 




260 










265 270 




Ala 


Leu 


Asp 


Ala 


Ser 


Pro 


Glu 


Glu 


Leu Glu Lys Gin Ser Gly Tyr 


Val 






275 










280 


285 




Phe 


Leu 


Tyr 


Glu 


Leu 


Val 


Lys 


Gin 


Thr Arg Asp Pro Asn Val Leu 


Arg 




290 










295 




300 




Asp 


Gin 


Ser 


Leu 


Asn 


He 


Leu 


Leu 


Ala Gly Arg Asp Thr Thr Ala 


Gly 


305 










310 






315 


rf% 

320 


Leu 


Leu 


Ser 


Phe 


Ala 


Val 


Phe 


Glu 


Leu Ala Arg His Pro Glu He 


Trp 










325 








330 335 




Ala 


Lys 


Leu 


Arg 


Glu 


Glu 


He 


Glu 


Gin Gin Phe Gly Leu Gly Glu 


Asp 








340 










345 350 




Ser 


Arg 


Val 


Glu 


Glu 


He 


Thr 


Phe 


Glu Ser Leu Lys Arg Cys Glu 


Tyr 




355 










360 


365 




Leu 


Lys 


Ala 


Phe 


Leu 


Asn 


Glu 


Thr 


Leu Arg He Tyr Pro Ser Val 


Pro 




370 










375 




380 




Arg 


Asn 


Phe 


Arg 


He 


Ala 


Thr 


Lys 


Asn Thr Thr Leu Pro Arg Gly 


Giy 


385 










390 






395 


400 


Gly 


Ser 


Asp 


Gly 


Thr 


Ser 


Pro 


He 


Leu He Gin Lys Gly Glu Ala 


Val 








405 








410 415 




Ser 


Tyr 


Gly 


He 


Asn 


Ser 


Thr 


His 


Leu Asp Pro Val Tyr Tyr Gly 


Pro 








420 










425 430 




Asp 


Ala 


Ala 


Glu 


Phe 


Arg 


Pro 


Glu 


Arg Trp Phe Glu Pro Ser Thr 


Lys 






435 










440 


445 




Lys 


Leu 


Gly 


Trp 


Ala 


Tyr 


Leu 


Pro 


Phe Asn Gly Gly Pro Arg He 


Cys 




450 










455 




460 




Leu 


Gly 


Gin 


Gin 


Phe 


Ala 


Leu 


Thr 


Glu Ala Gly Tyr Val Leu Val 


Arg 


465 








470 






475 


480 


Leu 


Val 


Gin 


Glu 


Phe 


Ser 


His 


Val 


Arg Leu Asp Pro Asp Glu Val 


Tyr 



485 490 495 
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Pro Pro Lys Arg Leu Thr Asn Leu Thr Net Cys Leu Gin Asp Oly Ala 

500 505 510 

lie Val Lys Phe Asp 
515 

INFORMATION FOR SEQ ID NO; 101; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 517 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 



Met 


He 


Glu 


Gin 


He 


Leu 


Glu 


Tyr 


Trp Tyr 


He Val Val Pro Val Leu 


i 








5 










10 


15 


Tvr 


He 


He 


Lys 


Gin 


Leu 


He 


Ala 


Tyr Ser 


Lys Thr Arg Val Leu Met 








20 










25 




30 


Lvs 


Gin 


Leu 


Gly 


Ala 


Ala 


Pro 


He 


Thr 


Asn 


Gin Leu Tyr Asp Asn Val 






35 










40 






45 


Phe 


Gly 


He 


Val 


Asn 


Gly 

* 


Trp 


Lys 


Ala 


Leu 


Gin Phe Lys Lys Glu Gly 




50 










55 








60 


Arcr 


Ala 


Gin 


Glu 


Tyr 


Asn 


Asp 


His 


Lys 


Phe 


Asp Ser Ser Lys Asn Pro 


65 










70 










75 80 


Ser 


Val 


Gly 


Thr 


Tyr 


Val 


Ser 


He 


Leu 


Phe 


Gly Thr Lys He Val Val 










85 










90 


95 


Thr 


Lvs 


Asp 


Pro 


GlU 


Asn 


He 


Lys 


Ala 


He 


Leu Ala Thr Gin Phe Gly 








100 










105 




1X0 


Asp 


Phe 


Ser 


Leu 


Gly 


Lys 


Arg 


His 


Ala 


Leu 


Phe Lys Pro Leu Leu Gly 






115 










120 






125 


Asp 


Gly 


He 


Phe 


Thr 


Leu 


Asp 


Gly 


Glu Gly 


Trp Lys His Ser Arg Ser 




130 










135 








140 


Met 


Leu 


Arg 


Pro 


Gin 


Phe 


Ala 


Arg 


Glu 


Gin 


Val Ala His Val Thr Ser 


145 










150 

W v 










155 160 


Leu 


Glu 


Pro 


His 


Phe 


Gin 


Leu 


Leu 


Lys Lys 


His lie Leu Lys His Lys 










165 










170 


175 


Gly 


Glu 


Tyr 


Phe 


Asp 


He 


Gin 


Glu 


Leu 


Phe 


Phe Arg Phe Thr Val Asp 








180 










185 




190 


Ser 


Ala 


Thr 


GlU 


Phe 


Leu 


Phe 


Gly 


Glu 


Ser 


Val His Ser Leu Lys Asp 






195 










200 






205 


Glu 


Thr 


He 


Gly 


He 


Asn 


Gin 


Asp 


Asp 


lie 


Asp Phe Ala Gly Arg Lys 




210 










215 








220 


Asp 


Phe 


Ala 


Glu 


Ser 


Phe 


Asn 


Lys 


Ala 


Gin 


Glu Tyr Leu Ser He Arg 


225 










230 










235 240 


He 


Leu 


Val 


Gin 


Thr 


Phe 


Tyr 


Trp 


Leu 


He 


Asn Asn Lys Glu Phe Arg 










245 










250 


255 


Asp 


Cys 


Thr 


Lys 


Leu 


Val 


His 


Lys 


Phe 


Thr 


Asn Tyr Tyr Val Gin Lys 








260 










265 




270 


Ala 


Leu 


Asp 


Ala 


Thr 


Pro 


Glu 


Glu 


Leu 


Glu 


Lys Gin Gly Gly Tyr Val 






275 










280 






285 


Phe 


Leu 


Tyr 


Glu 


Leu 


Val 


Lys 


Gin 


Thr Arg 


Asp Pro Lys Val Leu Arg 




290 










295 








300 


Asp 


Gin 


Ser 


Leu 


Asn 


He 


Leu 


Leu 


Ala Gly 


Arg Asp Thr Thr Ala Gly 


305 










310 










315 320 


Leu 


Leu 


Ser 


Phe 


Ala 


Val 


Phe 


Glu 


Leu 


Ala 


Arg Asn Pro His He Trp 










325 










330 


335 
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Ala Lys Leu Arg Glu Glu II Glu Gin Gin Phe Gly Leu Gly Glu Asp 

340 345 350 

Ser Arg Val Glu Glu He Thr Phe Glu Ser Leu Lys Arg Cys Glu Tyr 

355 360 365 

Leu Lys Ala Phe Leu Asn Glu Thr Leu Arg Val Tyr Pro Ser Val Pro 

370 375 380 

Arg Asn Phe Arg He Ala Thr Lys Asn Thr Thr Leu Pro Arg Gly Gly 

385 390 395 400 

Gly Pro Asp Gly Thr Gin Pro He Leu He Gin Lys Gly Glu Gly Val 

405 410 415 

Ser Tyr Gly He Asn Ser Thr His Leu Asp Pro Val Tyr Tyr Gly Pro 

420 425 430 

Asp Ala Ala Glu Phe Arg Pro Glu Arg Trp Phe Glu Pro Ser Thr Arg 

435 440 445 

Lys Leu Gly Trp Ala Tyr Leu Pro Phe Asn Gly Gly Pro Arg He Cys 
450 455 460 

Leu Gly Gin Gin Phe Ala Leu Thr Glu Ala Gly Tyr Val Leu Val Arg 

465 470 475 480 

Leu Val Gin Glu Phe Ser His He Arg Leu Asp Pro Asp Glu Val Tyr 

485 490 495 

Pro Pro Lys Arg Leu Thr Asn Leu Thr Met Cys Leu Gin Asp Gly Ala 

500 505 510 

He Val Lys Phe Asp 
515 

(2) INFORMATION FOR SEQ ID NO: 102: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 512 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 102: 

Tyr He Val Leu Pro Leu Leu 

10 15 
Val Arg Thr Asn Tyr Leu Met 

30 

His Val Gin Arg Asp Gly Trp 

45 

Leu Lys Ala Lys Ser Ala Gly 
60 

Phe His Asp Asn Glu Asp Thr 

75 80 
Val Val Phe Thr Arg Asp Pro 
90 95 
Gin Phe Gly Asp Phe Ser Leu 

110 

Leu Leu Gly Tyr Gly He Phe 

125 

Ser Arg Ala Met Leu Arg Pro 
140 

Val Thr Ser Leu Glu Pro His 
155 16° 
Lys His Lys Gly Glu Tyr Phe 
165 170 1 7 5 



Met 


Leu 


Asp 


Gin 


He 


Leu 


His 


Tyr 


Trp 


1 

Ala 


He 


He 


Asn 


5 

Gin 


He 


Val 


Ala 


His 








20 










25 


Lys 


Lys 


Leu 


Gly 


Ala 


Lys 


Pro 


Phe 


Thr 






35 










40 




Leu Gly 


Phe 


Lys 


Phe 


Gly 


Arg 


Glu 


Phe 




50 










55 






Arg 


Leu 


Val 


Asp 


Leu 


He 


He 


Ser 


Arg 


65 










70 








Phe 


Ser 


Ser 


Tyr 


Ala 


Phe 


Gly 


Asn 


His 










85 










Glu 


Asn 


He 


Lys 


Ala 


Leu 


Leu 


Ala 


Thr 








100 










105 


Gly Ser 


Arg 


Val 


Lys 


Phe 


Phe 


Lys 


Pro 






115 










120 




Thr 


Leu 


Asp 


Ala 


Glu 


Gly 


Trp 


Lys 


His 




130 










135 






Gin 


Phe 


Ala 


Arg 


Glu 


Gin 


Val 


Ala 


His 


145 










150 








Phe 


Gin 


Leu 


Leu 


Lys 


Lys 


His 


He 


Leu 
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Asp 


He 


Gin 


Glu 


Leu 


Phe Phe 


Arg 


Phe Tnr val Asp 


C A 4* 

o@r 


Aia 


Thr 


Glu 






180 








185 




190 






Phe 


Leu 


Phe 


Gly 


Glu 


Ser Val 


His 


Ser Leu Lys Asp 


Glu 


Glu 


He Gly 






195 








o n n 
200 




205 








Tyr 


Asp 


Thr 


Lys 


Asp 


Met Ser 


GXU 


Glu Arg Arg Arg 


Phe 


Ala 


Asp 


Ala 


210 








215 




^ A u 










Phe 


Asn 


Lys 


Ser 


cm 


vai 1 yX 


Val 


4vJLa /vi y v ax 


Ala 


Leu 


Gin 




225 








230 












240 
A v w 


Leu 


Tyr 


Trp 


Leu 


Val 


Asn Asn 


Lys 


giu irne jjys vj-L u. 


Cys Asn 




lie 






245 






1 C rt 

250 










Val 


His 


Lys 


Phe 


Thr 


Asn Tyr 


Tyr 


Val Gin Lys Ala 


Leu Asp 


Aia 


1 xir 






260 








265 




270 






Pro 


Glu 


Glu 


Leu 


Glu 


Lys Gin 


Gly 


Gly Tyr val pne 


Leu Tyr 


VllU 


Ijcu 






275 








280 




285 








Val 


Lys 


Gin 


Thr 


Arg 


Asp Pro 


Lys 


Val Leu Arg Asp 


Gin 


Ser 


Lieu 


ASH 




290 








295 




^ A A 

300 










lie 


Leu 


Leu 


Ala 


Gly 


Arg Asp 


Thr 


Tnr Ala Gly Leu 


Leu 


Ser 


Phe 


Ala 


305 










310 




315 








320 


Val 


Phe 


Glu 


Leu 


Ala 


Arg Asn 


Pro 


His lie Trp Aia 


Lys 


Leu 


Arg 


Glu 










325 






330 






335 




Glu 


He 


Glu 


Gin 


Gin 


Phe Gly 


Leu 


Gly GlU Asp oer 


Arg 


Val 


Glu 


Glu 








340 








345 




350 






He 


Thr 


Phe 


GlU 


Ser 


Leu Lys 


Arg 


Cys Glu Tyr Leu 


Lys 


Ala 


Val 


Leu 






355 








360 




365 








Asn 


Glu 


Thr 


Leu 


Arg 


Leu His 


m^ 

Fro 


s>er vai fro Arg 


Asn 


Ala 


Arg 


Phe 




370 








^ 

375 




ion 










Ala 


He 


Lys 


Asp 


Thr 


Thr Leu 


Pro 


Arg Giy uiy «iy 


Pro 


Asn 


Gly Lys 


385 








390 




a c 

395 








400 


Asp 


Pro 


He 


Leu 


He 


Arg Lys 


Asp 


Glu val vai Gin 


Tyr 


Ser 


He 


Ser 








405 






410 






415 




Ala 


Thr 


Gin 


Thr 


Asn 


Pro Ala 


Tyr 


Tyr Gly Aia Asp 


Ala. 


Ala 


Asp 


Phe 








420 








A f% mW 

425 




430 






Arg 


Pro 


Glu 


Arg 


Trp 


«kt _ mm. 

Phe Glu 


Pro 


Ser Tnr Arg Asn 


Leu Gly 


Trp 


Ala 




435 








440 




445 








Phe 


Leu 


Pro 


Phe 


Asn 


Gly Gly 


Pro 


Arg He Cys Leu 


Gly Gin 


Gin 


Phe 




450 








455 




460 










Ala 


Leu 


Thr 


Glu 


Ala 


Gly Tyr 


Val 


Leu Val Arg Leu 


Val 


Gin 


Glu 


Phe 


465 










470 




475 








480 


Pro 


Asn 


Leu 


Ser 


Gin 


Asp Pro 


Glu 


Thr Lys Tyr Pro 


Pro 


Pro 


Arg 


Leu 










485 




490 






495 




Ala 


His 


Leu 


Thr 


Met 


Cys Leu 


Phe 


Asp Gly Ala His 


Val 


Lys 


Met 


Ser 








500 








505 




510 







) INFORMATION FOR SEQ ID NO: 103: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 512 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:103: 
Met Leu Asp Gin He Phe His Tyr Trp Tyr He Val 
1 5 10 

Val He He Lys Gin He Val Ala His Ala Arg Thr 

20 25 



Leu Pro Leu Leu 
15 

Asn Tyr Leu Met 
30 
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Lys Lys Leu Gly Ala Lys Pro Phe Thr His Val Gin Leu Asp Gly Trp 

35 40 45 

Phe Gly Phe Lys Phe Gly Arg Glu Phe Leu Lys Ala Lys Ser Ala Gly 

50 55 60 

Arg Gin Val Asp Leu lie lie Ser Arg Phe His Asp Asn Glu Asp Thr 
65 70 75 80 

Phe Ser Ser Tyr Ala Phe Gly Asn His Val Val Phe Thr Arg Asp Pro 

85 90 95 

Glu Asn lie Lys Ala Leu Leu Ala Thr Gin Phe Gly Asp Phe Ser Leu 

100 105 110 

Gly Ser Arg Val Lys Phe Phe Lys Pro Leu Leu Gly Tyr Gly lie Phe 

115 120 125 

Thr Leu Asp Gly Glu Gly Trp Lys His Ser Arg Ala Met Leu Arg Pro 

130 135 140 

Gin Phe Ala Arg Glu Gin Val Ala His Val Thr Ser Leu Glu Pro His 
145 150 155 160 

Phe Gin Leu Leu Lys Lys His lie Leu Lys His Lys Gly Glu Tyr Phe 

165 170 175 

Asp He Gin Glu Leu Phe Phe Arg Phe Thr Val Asp Ser Ala Thr Glu 

180 185 190 

Phe Leu Phe Gly Glu Ser Val His Ser Leu Arg Asp Glu Glu He Gly 

195 200 205 

Tyr Asp Thr Lys Asp Met Ala Glu Glu Arg Arg Lys Phe Ala Asp Ala 

210 215 220 

Phe Asn Lys Ser Gin Val Tyr Leu Ser Thr Arg Val Ala Leu Gin Thr 
225 230 235 240 

Leu Tyr Trp Leu Val Asn Asn Lys Glu Phe Lys Glu Cys Asn Asp He 

245 250 255 

Val His Lys Phe Thr Asn Tyr Tyr Val Gin Lys Ala Leu Asp Ala Thr 

260 265 270 

Pro Glu Glu Leu Glu Lys Gin Gly Gly Tyr Val Phe Leu Tyr Glu Leu 

275 280 285 

Ala Lys Gin Thr Lys Asp Pro Asn Val Leu Arg Asp Gin Ser Leu Asn 

290 295 300 

He Leu Leu Ala Gly Arg Asp Thr Thr Ala Gly Leu Leu Ser Phe Ala 
305 310 315 320 

Val Phe Glu Leu Ala Arg Asn Pro His He Trp Ala Lys Leu Arg Glu 

325 330 335 

Glu He Glu Ser His Phe Gly Leu Gly Glu Asp Ser Arg Val Glu Glu 

340 345 350 

He Thr Phe Glu Ser Leu Lys Arg Cys Glu Tyr Leu Lys Ala Val Leu 

355 360 365 

Asn Glu Thr Leu Arg Leu His Pro Ser Val Pro Arg Asn Ala Arg Phe 

370 375 380 

Ala He Lys Asp Thr Thr Leu Pro Arg Gly Gly Gly Pro Asn Gly Lys 
385 390 395 400 

Asp Pro He Leu He Arg Lys Asn Glu Val Val Gin Tyr Ser He Ser 

405 410 415 

Ala Thr Gin Thr Asn Pro Ala Tyr Tyr Gly Ala Asp Ala Ala Asp Phe 

420 425 430 

Arg Pro Glu Arg Trp Phe Glu Pro Ser Thr Arg Asn Leu Gly Trp Ala 

435 440 445 

Tyr Leu Pro Phe Asn Gly Gly Pro Arg He Cys Leu Gly Gin Gin Phe 

450 455 460 

Ala Leu Thr Glu Ala Gly Tyr Val Leu Val Arg Leu Val Gin Glu Phe 
465 470 475 480 
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Pro Ser Leu Ser Gin Asp Pro Glu Thr Glu Tyr Pro Pro Pro Arg Leu 

485 490 495 

Ala His Leu Thr Met Cys Leu Phe Asp Gly Ala Tyr Val Lys Met Gin 

500 505 510 

(2) INFORMATION FOR SEQ ID NO: 104: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 499 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE 


DESCRIPTION: SEQ ID NO: 104: 


Val 




Met 


Ala 


lie 


Ser 


Ser 


Leu Leu 


Ser Trp 


Asp Val He Cys Val 


Phe 


1 








5 






10 


15 




lie 


Cys 


Val 


Cys 


Val 


Tyr Phe 


Gly Tyr 


Glu Tyr Cys Tyr Thr Lys Tyr 






20 






25 


30 






Leu 


Met 


His 


Lys 


His 


Gly Ala 


Arg Glu 


He Glu Asn Val He Asn Asp 






35 








40 


45 






Gly 


Phe 


Phe 


Gly 


Phe 


Arg Leu 


Pro Leu 


Leu Leu Met Arg Ala 


Ser 


Asn 


50 








55 




60 






Glu 


Gly 


Arg 


Leu 


lie 


Glu Phe 


Ser Val 


Lys Arg Phe Glu Ser 


Ala 


Pro 


65 






70 




75 




80 


His 


Pro 


Gin 


Asn 


Lys 


Thr Leu 


Val Asn 


Arg Ala Leu Ser Val 


Pro 


Val 










85 






90 


95 




lie 


Leu 


Thr 


Lys 


Asp 


Pro Val 


Asn He 


Lys Ala Met Leu Ser 


Thr 


Gin 








100 






105 


110 






Phe 


Asp 


Asp 


Phe 


Ser 


Leu Gly 


Leu Arg 


Leu His Gin Phe Ala 


Pro 


Leu 




115 








120 


125 






Leu 


Gly 


Lys 


Gly 


lie 


Phe Thr 


Leu Asp 


Gly Pro Glu Trp Lys 


Gin 


Ser 




130 








135 




140 


His 


He 


Arg 


Ser 


Met 


Leu 


Arg 


Pro Gin 


Phe Ala 


Lys Asp Arg Val Ser 


145 










150 




155 




160 


Leu 


Asp 


Leu 


Glu 


Pro 


His Phe 


Val Leu 


Leu Arg Lys His He Asp Gly 








165 






170 


175 




His 


Asn 


Gly 


Asp 


Tyr 


Phe Asp 


He Gin 


Glu Leu Tyr Phe Arg 


Phe 


Ser 








180 






185 


190 






Met 


Asp 


val 


Ala 


Thr 


Gly Phe 


Leu Phe 


Gly Glu Ser Val Gly Ser 


Leu 




195 








200 


205 






Lys 


Asp 


Glu 


Asp 


Ala 


Arg Phe 


Leu Glu 


Ala Phe Asn Glu Ser 


Gin 


Lys 




210 








215 




220 






Tyr 


Leu 


Ala 


Thr 


Arg 


Ala Thr 


Leu His 


Glu Leu Tyr Phe Leu Cys Asp 


225 










230 




235 




240 


Gly 


Phe 


Arg 


Phe 


Arg 


Gin Tyr 


Asn Lys 


Val Val Arg Lys Phe 


Cys 


Ser 








24S 






250 


255 




Gin 


Cys 


Val 


His 


Lys 


Ala Leu 


Asp Val 


Ala Pro Glu Asp Thr 


Ser 


Glu 






260 






265 


270 






Tyr 


Val 


Phe 


Leu 


Arg 


Glu Leu 


Val Lys 


His Thr Arg Asp Pro 


Val 


Val 




275 








280 


285 






Leu 


Gin 


Asp 


Gin 


Ala 


Leu Asn 


Val Leu 


Leu Ala Gly Arg Asp 


Thr 


Thr 




290 






295 




300 




His 


Ala 


Ser 


Leu 


Leu 


Ser 


Phe Ala 


Thr Phe 


Glu Leu Ala Arg Asn Asp 


305 










310 




315 




320 


Met 


Trp 


Arg 


Lys 


Leu 


Arg Glu 


Glu Val 


He Leu Thr Met Gly 


Pro 


Ser 








325 






330 


335 
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Ser 


Asp 


Glu 


He 


Thr 


Val 


Ala 


Gly 


Leu Lys 


ser cys Arg *yr 




uyf* 






340 










345 








Ala 


He 


Leu 


Asn 


Glu 


Thr 


Leu 


Arg 


Leu Tyr 


rlO Oci Vdl riw 




AoU 






355 










J o u 




365 

www 






Ala 


Arg 


Phe 


Ala 


Thr 


Arg 


Asn 


Tnr 


Tnr l^eu 


Fro Arg uiy uiy 


uiy 


XrlU 




370 










*» e 












Asp 


Gly 


Ser 


Phe 


pro 


He 


Leu 


lie 


Arg jjys 


uiy uin ±r±v vox 


ui y 


iyr 


385 








4 n n 

390 












400 


Phe 


He 


Cys 


Ala 


Thr 


HIS 


Leu 


Asn 


uxu jjys 


Val l y t uiy jusu 












405 












t!3 




His 


_ _ _ ^ 

val 


Phe 


Arg 


Pro 


Glu 


Arg 


xrp 


Ala Ala 


liCU UlU Uiy AJjrO 


Car 


Leu 








420 










425 


430 






Gly 


Trp 


Ser 


Tyr 


Leu 


Pro 


Phe 


Asn 


Gly Gly 


Pro Arg Ser Cys 


Leu 


Gly 


435 










440 




445 






Gin 


Gin 


Phe 


Ala 


He 


Leu 


Glu 


Ala 


Ser Tyr 


Val Leu Ala Arg 


Leu 


Thr 




450 










455 






460 






Gin 


Cys 


Tyr 


Thr 


Thr 


He 


Gin 


Leu 


Arg Thr 


Thr Glu Tyr Pro 


Pro 
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(2) INFORMATION FOR SEQ ID NO: 105: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1712 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:105: 

GGTACCGAGC TCACGAGTTT TGGGATTTTC GAGTTTGGAT TGTTTCCTTT GTTGATTGAA 60 

TTGACGAAAC CAGAGGTTTT CAAGACAGAT AAGATTGGGT TTATCAAAAC GCAGTTTGAA 120 

ATATTCCAGT TGGTTTCCAA GATATCTTGA AGAAGATTGA CGATTTGAAA TTTGAAGAAG 180 

TGGAGAAGAT CTGGTTTGGA TTGTTGGAGA ATTTCAAGAA TCTCAAGATT TACTCTAACG 240 

ACGGGTACAA CGAGAATTGT ATTGAATTGA TCAAGAACAT GATCTTGGTG TTACAGAACA 300 

TCAAGTTCTT GGACCAGACT GAGAATGCCA CAGATATACA AGGCGTCATG TGATAAAATG 360 

GATGAGATTT ATCCCACAAT TGAAGAAAGA GTTTATGGAA AGTGGTCAAC CAGAAGCTAA 420 

ACAGGAAGAA GCAAACGAAG AGGTGAAACA AGAAGAAGAA GGTAAATAAG TATTTTGTAT 480 

TATATAACAA ACAAAGTAAG GAATACAGAT TTATACAATA AATTGCCATA CTAGTCACGT 540 

GAGATATCTC ATCCATTCCC CAACTCCCAA GAAAAAAAAA AAGTGAAAAA AAAAATCAAA 600 

CCCAAAGATC AACCTCCCCA TCATCATCGT CATCAAACCC CCAGCTCAAT TCGCAATGGT 660 

TAGCACAAAA ACATACACAG AAAGGGCATC AGCACACCCC TCCAAGGTTG CCCAACGTTT 720 

ATTCCGCTTA ATGGAGTCCA AAAAGACCAA CCTCTGCGCC TCGATCGACG TGACCACAAC 780 

CGCCGAGTTC CTTTCGCTCA TCGACAAGCT CGGTCCCCAC ATCTGTCTCG TGAAGACGCA 840 

CATCGATATC ATCTCAGACT TCAGCTACGA GGGCACGATT GAGCCGTTGC TTGTGCTTGC 900 

AGAGCGCCAC GGGTTCTTGA TATTCGAGGA CAGGAAGTTT GCTGATATCG GAAACACCGT 960 

GATGTTGCAG TACACCTCGG GGGTATACCG GATCGCGGCG TGGAGTGACA TCACGAACGC 1020 

GCACGGAGTG ACTGGGAAGG GCGTCGTTGA AGGGTTGAAA CGCGGTGCGG AGGGGGTAGA 1080 

AAAGGAAAGG GGCGTGTTGA TGTTGGCGGA GTTGTCGAGT AAAGGCTCGT TGGCGCATGG 1140 

TGAATATACC CGTGAGACGA TCGAGATTGC GAAGAGTGAT CGGGAGTTCG TGATTGGGTT 1200 

CATCGCGCAG CGGGACATGG GGGGTAGAGA AGAAGGGTTT GATTGGATCA TCATGACGCC 1260 

TGGTGTGGGG TTGGATGATA AAGGCGATGC GTTGGGCCAG CAGTATAGGA CTGTTGATGA 1320 

GGTGGTTCTG ACTGGTACCG ATGTGATTAT TGTCGGGAGA GGGTTGTTTG GAAAAGGAAG 1380 

AGACCCTGAG GTGGAGGGAA AGAGATACAG GGATGCTGGA TGGAAGGCAT ACTTGAAGAG 1440 

AACTGGTCAG TTAGAATAAA TATTGTAATA AATAGGTCTA TATACATACA CTAAGCTTCT 1500 

AGGACGTCAT TGTAGTCTTC GAAGTTGTCT GCTAGTTTAG TTCTCATGAT TTCGAAAACC 1560 
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AATAACGCAA TGGATGTAGC AGGGATGGTG GTTAGT6CGT TCCTGACAAA CCCAGAGTAC 1620 
GCCGCCTCAA ACCACGTCAC ATTCGCCCTT TGCTTCATCC GCATCACTTG CTTGAAGGTA 1680 
TCCACGTACG AGTTGTAATA CACCTTGAAG AA 17l2 

(2) INFORMATION FOR SEQ ID NO: 106; 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 267 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106 : 

Met Val Ser Thr Lys Thr Tyr Thr Glu Arg Ala Ser Ala His Pro Ser 

1 5 10 15 

Lys Val Ala Gin Arg Leu Phe Arg Leu Met Glu Ser Lys Lys Thr Asn 

20 25 30 

Leu Cys Ala Ser lie Asp Val Thr Thr Thr Ala Glu Phe Leu Ser Leu 

35 40 45 

lie Asp Lys Leu Gly Pro His lie Cys Leu Val Lys Thr His lie Asp 

50 55 60 

lie lie Ser Asp Phe Ser Tyr Glu Gly Thr lie Glu Pro Leu Leu Val 
65 70 75 80 

Leu Ala Glu Arg His Gly Phe Leu He Phe Glu Asp Arg Lys Phe Ala 

85 90 95 

Asp He Gly Asn Thr Val Met Leu Gin Tyr Thr Ser Gly Val Tyr Arg 

100 105 no 

He Ala Ala Trp Ser Asp He Thr Asn Ala His Gly Val Thr Gly Lys 

115 120 125 

Gly Val Val Glu Gly Leu Lys Arg Gly Ala Glu Gly Val Glu Lys Glu 

130 135 140 

Arg Gly Val Leu Met Leu Ala Glu Leu Ser Ser Lys Gly Ser Leu Ala 
145 150 155 160 

His Gly Glu Tyr Thr Arg Glu Thr He Glu He Ala Lys Ser Asp Arg 

165 170 175 

Glu Phe Val He Gly Phe He Ala Gin Arg Asp Met Gly Gly Arg Glu 

180 185 190 

Glu Gly Phe Asp Trp He He Met Thr Pro Gly Val Gly Leu Asp Asp 

195 200 205 

Lys Gly Asp Ala Leu Gly Gin Gin Tyr Arg Thr Val Asp Glu Val Val 

210 215 220 

Leu Thr Gly Thr Asp Val He He Val Gly Arg Gly Leu Phe Gly Lys 
225 230 235 240 

Gly Arg Asp Pro Glu Val Glu Gly Lys Arg Tyr Arg Asp Ala Gly Trp 

245 250 255 

Lys Ala Tyr Leu Lys Arg Thr Gly Gin Leu Glu 

260 265 

(2) INFORMATION FOR SEQ ID NO: 107: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 473 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 

GTCAAAGCAA ATTGTT6GCC CAAGCAGACT CTTGGACCAC CGTTGAATGG AACATAAGCC 60 

CAGCCCAACT TCTTAGTAGA TGGTTCAAAC CATCTTTCTG GTCTGAAGTC GTTAGCGTCC 120 

TTACCGTAGT ATTCTTCCAA ACGGTGGGTC TTGTAGACAA CGTAAGCAAC AGTGGAGCCT 180 

TTAGGAATGT AGATTGGGTC GGTACCGTTA GCACCACCAC CTCTTGGCAA AGTGGTGTCT 240 

CTGGTGGCGG TTCTAAA6TT GACAGGAACA GATGGGTACA TACGCAAGGT TTCGTTAAGG 300 

ATAGCCTTCA AGTATTCACA TCTCTTCAAG GCTTCGAAAG TAATTTCTTC AACGCGGGAG 360 

TCTTCACCAA CACCAAAGTT AACTTCGATT TCTTCTCTCA ACTTGGACCA CATCTCTGGG 420 

TGTCTAGCCA ATTCAAACAA AGCAAAGGAC AACAAACCCG CGGTGGTGTC TCT 473 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CCTTAATTAA GAGGTC6TTG GTTGAGTTTT C 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CCTTAATTAA TTGATAATGA CGTTGCGGG 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 33 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
AGGCGCGCCG GAGTCCAAAA AGACCAACCT CTG 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
CCTTAATTAA TACGTGGATA CCTTCAAGCA AGTG 

(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
CCTTAATTAA GCTCACGAGT TTTGGGATTT TCGAG 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
GGGTTTAAAC CGCAGAGGTT GGTCTTTTTG GACTC 



